cudf.core.column.string.StringMethods.ngrams#

StringMethods.ngrams(n: int = 2, separator: str = '_') → SeriesOrIndex#

Generate the n-grams from a set of tokens, each record in series is treated a token.

You can generate tokens from a Series instance using the Series.str.tokenize() function.

Parameters:

nint: The degree of the n-gram (number of consecutive tokens). Default of 2 for bigrams.
separatorstr: The separator to use between within an n-gram. Default is ‘_’.

Examples

>>> import cudf
>>> str_series = cudf.Series(['this is my', 'favorite book'])
>>> str_series.str.ngrams(2, "_")
0    this is my_favorite book
dtype: object
>>> str_series = cudf.Series(['abc','def','xyz','hhh'])
>>> str_series.str.ngrams(2, "_")
0    abc_def
1    def_xyz
2    xyz_hhh
dtype: object