cudf.core.column.string.StringMethods.ngrams#
- StringMethods.ngrams(n: int = 2, separator: str = '_') SeriesOrIndex #
Generate the n-grams from a set of tokens, each record in series is treated a token.
You can generate tokens from a Series instance using the
Series.str.tokenize()
function.- Parameters:
- nint
The degree of the n-gram (number of consecutive tokens). Default of 2 for bigrams.
- separatorstr
The separator to use between within an n-gram. Default is ‘_’.
Examples
>>> import cudf >>> str_series = cudf.Series(['this is my', 'favorite book']) >>> str_series.str.ngrams(2, "_") 0 this is my_favorite book dtype: object >>> str_series = cudf.Series(['abc','def','xyz','hhh']) >>> str_series.str.ngrams(2, "_") 0 abc_def 1 def_xyz 2 xyz_hhh dtype: object