Strings Find#
- group strings_find
Functions
-
std::unique_ptr<column> find(strings_column_view const &input, string_scalar const &target, size_type start = 0, size_type stop = -1, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns a column of character position values where the target string is first found in each string of the provided column.
If
target
is not found, -1 is returned for that row entry in the output column.The target string is searched within each string in the character position range [start,stop). If the stop parameter is -1, then the end of each string becomes the final position to include in the search.
Any null string entries return corresponding null output column entries.
- Throws:
cudf::logic_error – if start position is greater than stop position.
- Parameters:
input – Strings instance for this operation
target – UTF-8 encoded string to search for in each string
start – First character position to include in the search
stop – Last position (exclusive) to include in the search. Default of -1 will search to the end of the string.
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
New integer column with character position values
-
std::unique_ptr<column> rfind(strings_column_view const &input, string_scalar const &target, size_type start = 0, size_type stop = -1, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns a column of character position values where the target string is first found searching from the end of each string.
If
target
is not found, -1 is returned for that entry.The target string is searched within each string in the character position range [start,stop). If the stop parameter is -1, then the end of each string becomes the final position to include in the search.
Any null string entries return corresponding null output column entries.
- Throws:
cudf::logic_error – if start position is greater than stop position.
- Parameters:
input – Strings instance for this operation
target – UTF-8 encoded string to search for in each string
start – First position to include in the search
stop – Last position (exclusive) to include in the search. Default of -1 will search starting at the end of the string.
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
New integer column with character position values
-
std::unique_ptr<column> find(strings_column_view const &input, strings_column_view const &target, size_type start = 0, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns a column of character position values where the target string is first found in the corresponding string of the provided column.
The output of row
i
is the character position of the target string for rowi
within input string of rowi
starting at the character positionstart
. If the target is not found within the input string, -1 is returned for that row entry in the output column.Any null input or target entries return corresponding null output column entries.
- Throws:
cudf::logic_error – if
input.size() != target.size()
- Parameters:
input – Strings to search against
target – Strings to search for in
input
start – First character position to include in the search
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
New integer column with character position values
-
std::unique_ptr<column> contains(strings_column_view const &input, string_scalar const &target, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns a column of boolean values for each string where true indicates the target string was found within that string in the provided column.
If the
target
is not found for a string, false is returned for that entry in the output column. Iftarget
is an empty string, true is returned for all non-null entries in the output column.Any null string entries return corresponding null entries in the output columns.
- Parameters:
input – Strings instance for this operation
target – UTF-8 encoded string to search for in each string
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
New BOOL8 column
-
std::unique_ptr<column> contains(strings_column_view const &input, strings_column_view const &targets, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns a column of boolean values for each string where true indicates the corresponding target string was found within that string in the provided column.
The ‘output[i] = true
if string
targets[i]is found inside
input[i]otherwise
output[i] = false. If
target[i]is an empty string, true is returned for
output[i]. If
target[i]is null, false is returned for
output[i]`.Any null string entries return corresponding null entries in the output columns.
- Throws:
cudf::logic_error – if
strings.size() != targets.size()
.- Parameters:
input – Strings instance for this operation
targets – Strings column of targets to check row-wise in
strings
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
New BOOL8 column
-
std::unique_ptr<column> starts_with(strings_column_view const &input, string_scalar const &target, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns a column of boolean values for each string where true indicates the target string was found at the beginning of that string in the provided column.
If
target
is not found at the beginning of a string, false is set for that row entry in the output column. Iftarget
is an empty string, true is returned for all non-null entries in the output column.Any null string entries return corresponding null entries in the output columns.
- Parameters:
input – Strings instance for this operation
target – UTF-8 encoded string to search for in each string
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
New type_id::BOOL8 column.
-
std::unique_ptr<column> starts_with(strings_column_view const &input, strings_column_view const &targets, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns a column of boolean values for each string where true indicates corresponding string in target column was found at the beginning of that string in the provided column.
If
targets[i]
is not found at the beginning of a string instrings[i]
, false is set for that row entry in the output column. Iftargets[i]
is an empty string, true is returned for corresponding entry in the output column.Any null string entries in
targets
return corresponding null entries in the output columns.- Throws:
cudf::logic_error – if
strings.size() != targets.size()
.- Parameters:
input – Strings instance for this operation
targets – Strings instance for this operation
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
New BOOL8 column
-
std::unique_ptr<column> ends_with(strings_column_view const &input, string_scalar const &target, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns a column of boolean values for each string where true indicates the target string was found at the end of that string in the provided column.
If
target
is not found at the end of a string, false is set for that row entry in the output column. Iftarget
is an empty string, true is returned for all non-null entries in the output column.Any null string entries return corresponding null entries in the output columns.
- Parameters:
input – Strings instance for this operation
target – UTF-8 encoded string to search for in each string
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
New BOOL8 column
-
std::unique_ptr<column> ends_with(strings_column_view const &input, strings_column_view const &targets, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns a column of boolean values for each string where true indicates corresponding string in target column was found at the end of that string in the provided column.
If
targets[i]
is not found at the end of a string instrings[i]
, false is set for that row entry in the output column. Iftargets[i]
is an empty string, true is returned for the corresponding entry in the output column.Any null string entries in
targets
return corresponding null entries in the output columns.- Throws:
cudf::logic_error – if
strings.size() != targets.size()
.- Parameters:
input – Strings instance for this operation
targets – Strings instance for this operation
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
New BOOL8 column
-
std::unique_ptr<column> find_multiple(strings_column_view const &input, strings_column_view const &targets, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns a lists column with character position values where each of the target strings are found in each string.
The size of the output column is
input.size()
. Each row of the output column is of sizetargets.size()
.output[i,j]
contains the position oftargets[j]
ininput[i]
Example: s = ["abc", "def"] t = ["a", "c", "e"] r = find_multiple(s, t) r is now {[ 0, 2,-1], // for "abc": "a" at pos 0, "c" at pos 2, "e" not found [-1,-1, 1 ]} // for "def": "a" and "b" not found, "e" at pos 1
- Throws:
cudf::logic_error – if
targets
is empty or contains nulls- Parameters:
input – Strings instance for this operation
targets – Strings to search for in each string
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
Lists column with character position values
-
std::unique_ptr<column> find(strings_column_view const &input, string_scalar const &target, size_type start = 0, size_type stop = -1, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#