cudf.core.column.string.StringMethods.character_ngrams#

StringMethods.character_ngrams(n: int = 2, as_list: bool = False) SeriesOrIndex#

Generate the n-grams from characters in a column of strings.

Parameters:
nint

The degree of the n-gram (number of consecutive characters). Default of 2 for bigrams.

as_listbool

Set to True to return ngrams in a list column where each list element is the ngrams for each string.

Examples

>>> import cudf
>>> str_series = cudf.Series(['abcd','efgh','xyz'])
>>> str_series.str.character_ngrams(2)
0    ab
0    bc
0    cd
1    ef
1    fg
1    gh
2    xy
2    yz
dtype: object
>>> str_series.str.character_ngrams(3)
0    abc
0    bcd
1    efg
1    fgh
2    xyz
dtype: object
>>> str_series.str.character_ngrams(3,True)
0    [abc, bcd]
1    [efg, fgh]
2         [xyz]
dtype: list