ngrams_tokenize#

pylibcudf.nvtext.ngrams_tokenize.ngrams_tokenize(Column input, size_type ngrams, Scalar delimiter, Scalar separator) Column#

Returns a single column of strings by tokenizing the input strings column and then producing ngrams of each string.

For details, see ngrams_tokenize()

Parameters:
inputColumn

Input strings

ngramssize_type

The ngram number to generate

delimiterScalar

UTF-8 characters used to separate each string into tokens. An empty string will separate tokens using whitespace.

separatorScalar

The string to use for separating ngram tokens

Returns:
Column

New strings columns of tokens