cudf.core.column.string.StringMethods.filter_tokens#

StringMethods.filter_tokens(min_token_length: int, replacement: str | None = None, delimiter: str | None = None) SeriesOrIndex[source]#

Remove tokens from within each string in the series that are smaller than min_token_length and optionally replace them with the replacement string. Tokens are identified by the delimiter character provided.

Parameters:
min_token_length: int

Minimum number of characters for a token to be retained in the output string.

replacementstr

String used in place of removed tokens.

delimiterstr

The character(s) used to locate the tokens of each string. Default is whitespace.

Returns:
Series or Index of object.

Examples

>>> import cudf
>>> sr = cudf.Series(["this is me", "theme music", ""])
>>> sr.str.filter_tokens(3, replacement="_")
0       this _ _
1    theme music
2
dtype: object
>>> sr = cudf.Series(["this;is;me", "theme;music", ""])
>>> sr.str.filter_tokens(5,None,";")
0             ;;
1    theme;music
2
dtype: object