cudf.core.column.string.StringMethods.split#

StringMethods.split(pat: str = None, n: int = - 1, expand: bool = None, regex: bool = None) SeriesOrIndex#

Split strings around given separator/delimiter.

Splits the string in the Series/Index from the beginning, at the specified delimiter string. Similar to str.split().

Parameters
patstr, default None

String or regular expression to split on. If not specified, split on whitespace.

nint, default -1 (all)

Limit number of splits in output. None, 0, and -1 will all be interpreted as “all splits”.

expandbool, default False

Expand the split strings into separate columns.

  • If True, return DataFrame/MultiIndex expanding dimensionality.

  • If False, return Series/Index, containing lists of strings.

regexbool, default None

Determines if the passed-in pattern is a regular expression:

  • If True, assumes the passed-in pattern is a regular expression

  • If False, treats the pattern as a literal string.

  • If pat length is 1, treats pat as a literal string.

Returns
Series, Index, DataFrame or MultiIndex

Type matches caller unless expand=True (see Notes).

See also

rsplit

Splits string around given separator/delimiter, starting from the right.

str.split

Standard library version for split.

str.rsplit

Standard library version for rsplit.

Notes

The handling of the n keyword depends on the number of found splits:

  • If found splits > n, make first n splits only

  • If found splits <= n, make all splits

  • If for a certain row the number of found splits < n, append None for padding up to n if expand=True.

If using expand=True, Series and Index callers return DataFrame and MultiIndex objects, respectively.

Examples

>>> import cudf
>>> data = ["this is a regular sentence",
...     "https://docs.python.org/index.html", None]
>>> s = cudf.Series(data)
>>> s
0            this is a regular sentence
1    https://docs.python.org/index.html
2                                  <NA>
dtype: object

In the default setting, the string is split by whitespace.

>>> s.str.split()
0        [this, is, a, regular, sentence]
1    [https://docs.python.org/index.html]
2                                    None
dtype: list

Without the n parameter, the outputs of rsplit and split are identical.

>>> s.str.rsplit()
0        [this, is, a, regular, sentence]
1    [https://docs.python.org/index.html]
2                                    None
dtype: list

The n parameter can be used to limit the number of splits on the delimiter.

>>> s.str.split(n=2)
0          [this, is, a regular sentence]
1    [https://docs.python.org/index.html]
2                                    None
dtype: list

The pat parameter can be used to split by other characters.

>>> s.str.split(pat="/")
0               [this is a regular sentence]
1    [https:, , docs.python.org, index.html]
2                                       None
dtype: list

When using expand=True, the split elements will expand out into separate columns. If <NA> value is present, it is propagated throughout the columns during the split.

>>> s.str.split(expand=True)
                                    0     1     2        3         4
0                                this    is     a  regular  sentence
1  https://docs.python.org/index.html  <NA>  <NA>     <NA>      <NA>
2                                <NA>  <NA>  <NA>     <NA>      <NA>