cudf.core.wordpiece_tokenize.WordPieceVocabulary#

class cudf.core.wordpiece_tokenize.WordPieceVocabulary(vocabulary: Series)[source]#

A vocabulary object used to tokenize input text.

Parameters:
vocabularycudf.Series

Strings column of vocabulary terms

Methods

tokenize(text[, max_words_per_row])

Produces tokens for the input strings.