- StringMethods.filter_tokens(min_token_length: int, replacement: Optional[str] = None, delimiter: Optional[str] = None) SeriesOrIndex #
Remove tokens from within each string in the series that are smaller than min_token_length and optionally replace them with the replacement string. Tokens are identified by the delimiter character provided.
- min_token_length: int
Minimum number of characters for a token to be retained in the output string.
String used in place of removed tokens.
The character(s) used to locate the tokens of each string. Default is whitespace.
- Series or Index of object.
>>> import cudf >>> sr = cudf.Series(["this is me", "theme music", ""]) >>> sr.str.filter_tokens(3, replacement="_") 0 this _ _ 1 theme music 2 dtype: object >>> sr = cudf.Series(["this;is;me", "theme;music", ""]) >>> sr.str.filter_tokens(5,None,";") 0 ;; 1 theme;music 2 dtype: object