byte_pair_encode#

class pylibcudf.nvtext.byte_pair_encode.BPEMergePairs#

The table of merge pairs for the BPE encoder.

For details, see cudf::nvtext::bpe_merge_pairs.

pylibcudf.nvtext.byte_pair_encode.byte_pair_encoding(Column input, BPEMergePairs merge_pairs, Scalar separator=None) Column#

Byte pair encode the input strings.

For details, see cpp:func:cudf::nvtext::byte_pair_encoding

Parameters:
inputColumn

Strings to encode.

merge_pairsBPEMergePairs

Substrings to rebuild each string on.

separatorScalar

String used to build the output after encoding. Default is a space.

Returns:
Column

An encoded column of strings.