cudf.core.column.string.StringMethods.rsplit#

StringMethods.rsplit(pat: str | None = None, n: int = -1, expand: bool = False, regex: bool | None = None) SeriesOrIndex#

Split strings around given separator/delimiter.

Splits the string in the Series/Index from the end, at the specified delimiter string. Similar to str.rsplit().

Parameters:
patstr, default ‘ ‘ (space)

String to split on, does not yet support regular expressions.

nint, default -1 (all)

Limit number of splits in output. None, 0, and -1 will all be interpreted as “all splits”.

expandbool, default False

Expand the split strings into separate columns.

  • If True, return DataFrame/MultiIndex expanding dimensionality.

  • If False, return Series/Index, containing lists of strings.

regexbool, default None

Determines if the passed-in pattern is a regular expression:

  • If True, assumes the passed-in pattern is a regular expression

  • If False, treats the pattern as a literal string.

  • If pat length is 1, treats pat as a literal string.

Returns:
Series, Index, DataFrame or MultiIndex

Type matches caller unless expand=True (see Notes).

See also

split

Split strings around given separator/delimiter.

str.split

Standard library version for split.

str.rsplit

Standard library version for rsplit.

Notes

The handling of the n keyword depends on the number of found splits:

  • If found splits > n, make first n splits only

  • If found splits <= n, make all splits

  • If for a certain row the number of found splits < n, append None for padding up to n if expand=True.

If using expand=True, Series and Index callers return DataFrame and MultiIndex objects, respectively.

Examples

>>> import cudf
>>> s = cudf.Series(
...     [
...         "this is a regular sentence",
...         "https://docs.python.org/3/tutorial/index.html",
...         None
...     ]
... )
>>> s
0                       this is a regular sentence
1    https://docs.python.org/3/tutorial/index.html
2                                             <NA>
dtype: object

In the default setting, the string is split by whitespace.

>>> s.str.rsplit()
0                   [this, is, a, regular, sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                               None
dtype: list

Without the n parameter, the outputs of rsplit and split are identical.

>>> s.str.split()
0                   [this, is, a, regular, sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                               None
dtype: list

The n parameter can be used to limit the number of splits on the delimiter. The outputs of split and rsplit are different.

>>> s.str.rsplit(n=2)
0                     [this is a, regular, sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                               None
dtype: list
>>> s.str.split(n=2)
0                     [this, is, a regular sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                               None
dtype: list

When using expand=True, the split elements will expand out into separate columns. If <NA> value is present, it is propagated throughout the columns during the split.

>>> s.str.rsplit(n=2, expand=True)
                                               0        1         2
0                                      this is a  regular  sentence
1  https://docs.python.org/3/tutorial/index.html     <NA>      <NA>
2                                           <NA>     <NA>      <NA>

For slightly more complex use cases like splitting the html document name from a url, a combination of parameter settings can be used.

>>> s.str.rsplit("/", n=1, expand=True)
                                    0           1
0          this is a regular sentence        <NA>
1  https://docs.python.org/3/tutorial  index.html
2                                <NA>        <NA>