Strings Copy#

group strings_copy

Functions

std::unique_ptr<string_scalar> repeat_string(string_scalar const &input, size_type repeat_times, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Repeat the given string scalar a given number of times.

An output string scalar is generated by repeating the input string by a number of times given by the repeat_times parameter.

In special cases:

  • If repeat_times is not a positive value, an empty (valid) string scalar will be returned.

  • An invalid input scalar will always result in an invalid output scalar regardless of the value of repeat_times parameter.

Example:
s   = '123XYZ-'
out = repeat_strings(s, 3)
out is '123XYZ-123XYZ-123XYZ-'
Throws:

std::overflow_error – if the size of the output string scalar exceeds the maximum value that can be stored by the scalar: input.size() * repeat_times > max of size_type

Parameters:
  • input – The scalar containing the string to repeat

  • repeat_times – The number of times the input string is repeated

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned string scalar

Returns:

New string scalar in which the input string is repeated

std::unique_ptr<column> repeat_strings(strings_column_view const &input, size_type repeat_times, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Repeat each string in the given strings column a given number of times.

An output strings column is generated by repeating each string from the input strings column by the number of times given by the repeat_times parameter.

In special cases:

  • If repeat_times is not a positive number, a non-null input string will always result in an empty output string.

  • A null input string will always result in a null output string regardless of the value of the repeat_times parameter.

Example:
strs = ['aa', null, '', 'bbc']
out  = repeat_strings(strs, 3)
out is ['aaaaaa', null, '', 'bbcbbcbbc']
Parameters:
  • input – The column containing strings to repeat

  • repeat_times – The number of times each input string is repeated

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned strings column

Returns:

New column containing the repeated strings

std::unique_ptr<column> repeat_strings(strings_column_view const &input, column_view const &repeat_times, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Repeat each string in the given strings column by the numbers of times given in another numeric column.

An output strings column is generated by repeating each of the input string by a number of times given by the corresponding row in a repeat_times numeric column.

In special cases:

  • Any null row (from either the input strings column or the repeat_times column) will always result in a null output string.

  • If any value in the repeat_times column is not a positive number and its corresponding input string is not null, the output string will be an empty string.

Example:
strs         = ['aa', null, '', 'bbc-']
repeat_times = [ 1,   2,     3,  4   ]
out          = repeat_strings(strs, repeat_times)
out is ['aa', null, '', 'bbc-bbc-bbc-bbc-']
Throws:
Parameters:
  • input – The column containing strings to repeat

  • repeat_times – The column containing numbers of times that the corresponding input strings for each row are repeated

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned strings column

Returns:

New column containing the repeated strings.