Strings Slice#
- group strings_slice
Functions
-
std::unique_ptr<column> slice_strings(strings_column_view const &input, numeric_scalar<size_type> const &start = numeric_scalar<size_type>(0, false), numeric_scalar<size_type> const &stop = numeric_scalar<size_type>(0, false), numeric_scalar<size_type> const &step = numeric_scalar<size_type>(1), rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns a new strings column that contains substrings of the strings in the provided column.
The character positions to retrieve in each string are
[start,stop)
. If the start position is outside a string’s length, an empty string is returned for that entry. If the stop position is past the end of a string’s length, the end of the string is used for stop position for that string.Null string entries will return null output string entries.
Example: s = ["hello", "goodbye"] r = slice_strings(s,2,6) r is now ["llo","odby"] r2 = slice_strings(s,2,5,2) r2 is now ["lo","ob"]
- Parameters:
input – Strings column for this operation
start – First character position to begin the substring
stop – Last character position (exclusive) to end the substring
step – Distance between input characters retrieved
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
New strings column with sorted elements of this instance
-
std::unique_ptr<column> slice_strings(strings_column_view const &input, column_view const &starts, column_view const &stops, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns a new strings column that contains substrings of the strings in the provided column using unique ranges for each string.
The character positions to retrieve in each string are specified in the
starts
andstops
integer columns. If a start position is outside a string’s length, an empty string is returned for that entry. If a stop position is past the end of a string’s length, the end of the string is used for stop position for that string. Any stop position value set to -1 will indicate to use the end of the string as the stop position for that string.Null string entries will return null output string entries.
The starts and stops column must both be the same integer type and must be the same size as the strings column.
Example: s = ["hello", "goodbye"] starts = [ 1, 2 ] stops = [ 5, 4 ] r = slice_strings(s,starts,stops) r is now ["ello","od"]
- Throws:
cudf::logic_error – if starts or stops is a different size than the strings column.
cudf::logic_error – if starts and stops are not same integer type.
cudf::logic_error – if starts or stops contains nulls.
- Parameters:
input – Strings column for this operation
starts – First character positions to begin the substring
stops – Last character (exclusive) positions to end the substring
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
New strings column with sorted elements of this instance
-
std::unique_ptr<column> slice_strings(strings_column_view const &input, numeric_scalar<size_type> const &start = numeric_scalar<size_type>(0, false), numeric_scalar<size_type> const &stop = numeric_scalar<size_type>(0, false), numeric_scalar<size_type> const &step = numeric_scalar<size_type>(1), rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#