cudf.core.column.string.StringMethods.filter_tokens#

StringMethods.filter_tokens(min_token_length: int, replacement: str | None = None, delimiter: str | None = None) → SeriesOrIndex#

Remove tokens from within each string in the series that are smaller than min_token_length and optionally replace them with the replacement string. Tokens are identified by the delimiter character provided.

Parameters:

min_token_length: int: Minimum number of characters for a token to be retained in the output string.
replacementstr: String used in place of removed tokens.
delimiterstr: The character(s) used to locate the tokens of each string. Default is whitespace.

Returns:

Series or Index of object.

Examples

>>> import cudf
>>> sr = cudf.Series(["this is me", "theme music", ""])
>>> sr.str.filter_tokens(3, replacement="_")
0       this _ _
1    theme music
2
dtype: object
>>> sr = cudf.Series(["this;is;me", "theme;music", ""])
>>> sr.str.filter_tokens(5,None,";")
0             ;;
1    theme;music
2
dtype: object