cudf.core.column.string.StringMethods.join#

StringMethods.join(sep=None, string_na_rep=None, sep_na_rep=None) SeriesOrIndex#

Join lists contained as elements in the Series/Index with passed delimiter.

If the elements of a Series are lists themselves, join the content of these lists using the delimiter passed to the function. This function is an equivalent to str.join(). In the special case that the lists in the Series contain only None, a <NA>/None value will always be returned.

Parameters:
sepstr or array-like

If str, the delimiter is used between list entries. If array-like, the string at a position is used as a delimiter for corresponding row of the list entries.

string_na_repstr, default None

This character will take the place of null strings (not empty strings) in the Series but will be considered only if the Series contains list elements and those lists have at least one non-null string. If string_na_rep is None, it defaults to empty space “”.

sep_na_repstr, default None

This character will take the place of any null strings (not empty strings) in sep. This parameter can be used only if sep is array-like. If sep_na_rep is None, it defaults to empty space “”.

Returns:
Series/Index: object

The list entries concatenated by intervening occurrences of the delimiter.

Raises:
ValueError
  • If sep_na_rep is supplied when sep is str.

  • If sep is array-like and not of equal length with Series/Index.

TypeError
  • If string_na_rep or sep_na_rep are not scalar values.

  • If sep is not of following types: str or array-like.

Examples

>>> import cudf
>>> ser = cudf.Series([['a', 'b', 'c'], ['d', 'e'], ['f'], ['g', ' ', 'h']])
>>> ser
0    [a, b, c]
1       [d, e]
2          [f]
3    [g,  , h]
dtype: list
>>> ser.str.join(sep='-')
0    a-b-c
1      d-e
2        f
3    g- -h
dtype: object

sep can an array-like input:

>>> ser.str.join(sep=['-', '+', '.', '='])
0    a-b-c
1      d+e
2        f
3    g= =h
dtype: object

If the actual series doesn’t have lists, each character is joined by sep:

>>> ser = cudf.Series(['abc', 'def', 'ghi'])
>>> ser
0    abc
1    def
2    ghi
dtype: object
>>> ser.str.join(sep='_')
0    a_b_c
1    d_e_f
2    g_h_i
dtype: object

We can replace <NA>/None values present in lists using string_na_rep if the lists contain at least one valid string (lists containing all None will result in a <NA>/None value):

>>> ser = cudf.Series([['a', 'b', None], [None, None, None], None, ['c', 'd']])
>>> ser
0          [a, b, None]
1    [None, None, None]
2                  None
3                [c, d]
dtype: list
>>> ser.str.join(sep='_', string_na_rep='k')
0    a_b_k
1     <NA>
2     <NA>
3      c_d
dtype: object

We can replace <NA>/None values present in lists of sep using sep_na_rep:

>>> ser.str.join(sep=[None, '^', '.', '-'], sep_na_rep='+')
0    a+b+
1    <NA>
2    <NA>
3     c-d
dtype: object