cudf.core.column.string.StringMethods.filter_tokens#
- StringMethods.filter_tokens(min_token_length: int, replacement: str | None = None, delimiter: str | None = None) SeriesOrIndex [source]#
Remove tokens from within each string in the series that are smaller than min_token_length and optionally replace them with the replacement string. Tokens are identified by the delimiter character provided.
- Parameters:
- min_token_length: int
Minimum number of characters for a token to be retained in the output string.
- replacementstr
String used in place of removed tokens.
- delimiterstr
The character(s) used to locate the tokens of each string. Default is whitespace.
- Returns:
- Series or Index of object.
Examples
>>> import cudf >>> sr = cudf.Series(["this is me", "theme music", ""]) >>> sr.str.filter_tokens(3, replacement="_") 0 this _ _ 1 theme music 2 dtype: object >>> sr = cudf.Series(["this;is;me", "theme;music", ""]) >>> sr.str.filter_tokens(5,None,";") 0 ;; 1 theme;music 2 dtype: object