Strings Slice#

group strings_slice

Functions

std::unique_ptr<column> slice_strings(strings_column_view const &input, numeric_scalar<size_type> const &start = numeric_scalar<size_type>(0, false), numeric_scalar<size_type> const &stop = numeric_scalar<size_type>(0, false), numeric_scalar<size_type> const &step = numeric_scalar<size_type>(1), rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Returns a new strings column that contains substrings of the strings in the provided column.

The character positions to retrieve in each string are [start,stop). If the start position is outside a string’s length, an empty string is returned for that entry. If the stop position is past the end of a string’s length, the end of the string is used for stop position for that string.

Null string entries will return null output string entries.

Example:
s = ["hello", "goodbye"]
r = slice_strings(s,2,6)
r is now ["llo","odby"]
r2 = slice_strings(s,2,5,2)
r2 is now ["lo","ob"]
Parameters:
  • input – Strings column for this operation

  • start – First character position to begin the substring

  • stop – Last character position (exclusive) to end the substring

  • step – Distance between input characters retrieved

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

New strings column with sorted elements of this instance

std::unique_ptr<column> slice_strings(strings_column_view const &input, column_view const &starts, column_view const &stops, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Returns a new strings column that contains substrings of the strings in the provided column using unique ranges for each string.

The character positions to retrieve in each string are specified in the starts and stops integer columns. If a start position is outside a string’s length, an empty string is returned for that entry. If a stop position is past the end of a string’s length, the end of the string is used for stop position for that string. Any stop position value set to -1 will indicate to use the end of the string as the stop position for that string.

Null string entries will return null output string entries.

The starts and stops column must both be the same integer type and must be the same size as the strings column.

Example:
s = ["hello", "goodbye"]
starts = [ 1, 2 ]
stops = [ 5, 4 ]
r = slice_strings(s,starts,stops)
r is now ["ello","od"]
Throws:
Parameters:
  • input – Strings column for this operation

  • starts – First character positions to begin the substring

  • stops – Last character (exclusive) positions to end the substring

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

New strings column with sorted elements of this instance