ngrams_tokenize#
- pylibcudf.nvtext.ngrams_tokenize.ngrams_tokenize(Column input, size_type ngrams, Scalar delimiter, Scalar separator) Column #
Returns a single column of strings by tokenizing the input strings column and then producing ngrams of each string.
For details, see
ngrams_tokenize()
- Parameters:
- inputColumn
Input strings
- ngramssize_type
The ngram number to generate
- delimiterScalar
UTF-8 characters used to separate each string into tokens. An empty string will separate tokens using whitespace.
- separatorScalar
The string to use for separating ngram tokens
- Returns:
- Column
New strings columns of tokens