Strings Find#

group strings_find

Functions

std::unique_ptr<column> find(strings_column_view const &input, string_scalar const &target, size_type start = 0, size_type stop = -1, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a column of character position values where the target string is first found in each string of the provided column.

If target is not found, -1 is returned for that row entry in the output column.

The target string is searched within each string in the character position range [start,stop). If the stop parameter is -1, then the end of each string becomes the final position to include in the search.

Any null string entries return corresponding null output column entries.

Throws:

cudf::logic_error – if start position is greater than stop position.

Parameters:
  • input – Strings instance for this operation

  • target – UTF-8 encoded string to search for in each string

  • start – First character position to include in the search

  • stop – Last position (exclusive) to include in the search. Default of -1 will search to the end of the string.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

New integer column with character position values

std::unique_ptr<column> rfind(strings_column_view const &input, string_scalar const &target, size_type start = 0, size_type stop = -1, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a column of character position values where the target string is first found searching from the end of each string.

If target is not found, -1 is returned for that entry.

The target string is searched within each string in the character position range [start,stop). If the stop parameter is -1, then the end of each string becomes the final position to include in the search.

Any null string entries return corresponding null output column entries.

Throws:

cudf::logic_error – if start position is greater than stop position.

Parameters:
  • input – Strings instance for this operation

  • target – UTF-8 encoded string to search for in each string

  • start – First position to include in the search

  • stop – Last position (exclusive) to include in the search. Default of -1 will search starting at the end of the string.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

New integer column with character position values

std::unique_ptr<column> find(strings_column_view const &input, strings_column_view const &target, size_type start = 0, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a column of character position values where the target string is first found in the corresponding string of the provided column.

The output of row i is the character position of the target string for row i within input string of row i starting at the character position start. If the target is not found within the input string, -1 is returned for that row entry in the output column.

Any null input or target entries return corresponding null output column entries.

Throws:

cudf::logic_error – if input.size() != target.size()

Parameters:
  • input – Strings to search against

  • target – Strings to search for in input

  • start – First character position to include in the search

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

New integer column with character position values

std::unique_ptr<column> contains(strings_column_view const &input, string_scalar const &target, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a column of boolean values for each string where true indicates the target string was found within that string in the provided column.

If the target is not found for a string, false is returned for that entry in the output column. If target is an empty string, true is returned for all non-null entries in the output column.

Any null string entries return corresponding null entries in the output columns.

Parameters:
  • input – Strings instance for this operation

  • target – UTF-8 encoded string to search for in each string

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

New BOOL8 column

std::unique_ptr<column> contains(strings_column_view const &input, strings_column_view const &targets, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a column of boolean values for each string where true indicates the corresponding target string was found within that string in the provided column.

The ‘output[i] = trueif stringtargets[i]is found insideinput[i]otherwise output[i] = false. Iftarget[i]is an empty string, true is returned foroutput[i]. Iftarget[i]is null, false is returned foroutput[i]`.

Any null string entries return corresponding null entries in the output columns.

Throws:

cudf::logic_error – if strings.size() != targets.size().

Parameters:
  • input – Strings instance for this operation

  • targets – Strings column of targets to check row-wise in strings

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

New BOOL8 column

std::unique_ptr<column> starts_with(strings_column_view const &input, string_scalar const &target, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a column of boolean values for each string where true indicates the target string was found at the beginning of that string in the provided column.

If target is not found at the beginning of a string, false is set for that row entry in the output column. If target is an empty string, true is returned for all non-null entries in the output column.

Any null string entries return corresponding null entries in the output columns.

Parameters:
  • input – Strings instance for this operation

  • target – UTF-8 encoded string to search for in each string

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

New type_id::BOOL8 column.

std::unique_ptr<column> starts_with(strings_column_view const &input, strings_column_view const &targets, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a column of boolean values for each string where true indicates corresponding string in target column was found at the beginning of that string in the provided column.

If targets[i] is not found at the beginning of a string in strings[i], false is set for that row entry in the output column. If targets[i] is an empty string, true is returned for corresponding entry in the output column.

Any null string entries in targets return corresponding null entries in the output columns.

Throws:

cudf::logic_error – if strings.size() != targets.size().

Parameters:
  • input – Strings instance for this operation

  • targets – Strings instance for this operation

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

New BOOL8 column

std::unique_ptr<column> ends_with(strings_column_view const &input, string_scalar const &target, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a column of boolean values for each string where true indicates the target string was found at the end of that string in the provided column.

If target is not found at the end of a string, false is set for that row entry in the output column. If target is an empty string, true is returned for all non-null entries in the output column.

Any null string entries return corresponding null entries in the output columns.

Parameters:
  • input – Strings instance for this operation

  • target – UTF-8 encoded string to search for in each string

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

New BOOL8 column

std::unique_ptr<column> ends_with(strings_column_view const &input, strings_column_view const &targets, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a column of boolean values for each string where true indicates corresponding string in target column was found at the end of that string in the provided column.

If targets[i] is not found at the end of a string in strings[i], false is set for that row entry in the output column. If targets[i] is an empty string, true is returned for the corresponding entry in the output column.

Any null string entries in targets return corresponding null entries in the output columns.

Throws:

cudf::logic_error – if strings.size() != targets.size().

Parameters:
  • input – Strings instance for this operation

  • targets – Strings instance for this operation

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

New BOOL8 column

std::unique_ptr<table> contains_multiple(strings_column_view const &input, strings_column_view const &targets, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::mr::device_memory_resource *mr = rmm::mr::get_current_device_resource())#

Searches for the given target strings within each string in the provided column.

Each column in the result table corresponds to the result for the target string at the same ordinal. i.e. 0th column is the BOOL8 column result for the 0th target string, 1st for 1st, etc.

If the target is not found for a string, false is returned for that entry in the output column. If the target is an empty string, true is returned for all non-null entries in the output column.

Any null input strings return corresponding null entries in the output columns.

input = ["a", "b", "c"]
targets = ["a", "c"]
output is a table with two boolean columns:
  column 0: [true, false, false]
  column 1: [false, false, true]
Throws:

std::invalid_argument – if targets is empty or contains nulls

Parameters:
  • input – Strings instance for this operation

  • targets – UTF-8 encoded strings to search for in each string in input

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

Table of BOOL8 columns

std::unique_ptr<column> find_multiple(strings_column_view const &input, strings_column_view const &targets, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Searches for the given target strings within each string in the provided column and returns the position the targets were found.

The size of the output column is input.size(). Each row of the output column is of size targets.size().

output[i,j] contains the position of targets[j] in input[i]

Example:
s = ["abc", "def"]
t = ["a", "c", "e"]
r = find_multiple(s, t)
r is now {[ 0, 2,-1],   // for "abc": "a" at pos 0, "c" at pos 2, "e" not found
          [-1,-1, 1 ]}  // for "def": "a" and "b" not found, "e" at  pos 1
Throws:

std::invalid_argument – if targets is empty or contains nulls

Parameters:
  • input – Strings instance for this operation

  • targets – Strings to search for in each string

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

Lists column with character position values