cudf.core.column.string.StringMethods.tokenize#

StringMethods.tokenize(delimiter: str = ' ') SeriesOrIndex#

Each string is split into tokens using the provided delimiter(s). The sequence returned contains the tokens in the order they were found.

Parameters:
delimiterstr or list of strs, Default is whitespace.

The string used to locate the split points of each string.

Returns:
Series or Index of object.

Examples

>>> import cudf
>>> data = ["hello world", "goodbye world", "hello goodbye"]
>>> ser = cudf.Series(data)
>>> ser.str.tokenize()
0      hello
0      world
1    goodbye
1      world
2      hello
2    goodbye
dtype: object