Files | |
file | strings/contains.hpp |
Strings APIs for regex contains, count, matches, like. | |
file | findall.hpp |
std::unique_ptr<column> cudf::strings::contains_re | ( | strings_column_view const & | input, |
regex_program const & | prog, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Returns a boolean column identifying rows which match the given regex_program object.
Any null string entries return corresponding null output column entries.
See the Regex Features page for details on patterns supported by this API.
input | Strings instance for this operation |
prog | Regex program instance |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::strings::count_re | ( | strings_column_view const & | input, |
regex_program const & | prog, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Returns the number of times the given regex_program's pattern matches in each string.
Any null string entries return corresponding null output column entries.
See the Regex Features page for details on patterns supported by this API.
input | Strings instance for this operation |
prog | Regex program instance |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::strings::find_re | ( | strings_column_view const & | input, |
regex_program const & | prog, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Returns the starting character index of the first match for the given pattern in each row of the input column.
A null output row occurs if the corresponding input row is null. A -1 is returned for rows that do not contain a match.
See the Regex Features page for details on patterns supported by this API.
input | Strings instance for this operation |
prog | Regex program instance |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::strings::findall | ( | strings_column_view const & | input, |
regex_program const & | prog, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Returns a lists column of strings for each matching occurrence using the regex_program pattern within each string.
Each output row includes all the substrings within the corresponding input row that match the given pattern. If no matches are found, the output row is empty.
A null output row occurs if the corresponding input row is null.
See the Regex Features page for details on patterns supported by this API.
input | Strings instance for this operation |
prog | Regex program instance |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::strings::like | ( | strings_column_view const & | input, |
string_scalar const & | pattern, | ||
string_scalar const & | escape_character = string_scalar("") , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Returns a boolean column identifying rows which match the given like pattern.
The like pattern expects only 2 wildcard special characters:
%
zero or more of any character_
any single characterSpecify an escape character to include either %
or _
in the search. The escape_character
is expected to be either 0 or 1 characters. If more than one character is specified only the first character is used.
Any null string entries return corresponding null output column entries.
cudf::logic_error | if pattern or escape_character is invalid |
input | Strings instance for this operation |
pattern | Like pattern to match within each string |
escape_character | Optional character specifies the escape prefix. Default is no escape character. |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::strings::like | ( | strings_column_view const & | input, |
strings_column_view const & | patterns, | ||
string_scalar const & | escape_character = string_scalar("") , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Returns a boolean column identifying rows which match the corresponding like pattern in the given patterns.
The like pattern expects only 2 wildcard special characters:
%
zero or more of any character_
any single characterSpecify an escape character to include either %
or _
in the search. The escape_character
is expected to be either 0 or 1 characters. If more than one character is specified only the first character is used. The escape character is applied to all patterns.
Any null string entries return corresponding null output column entries.
cudf::logic_error | if patterns contains nulls or escape_character is invalid |
cudf::logic_error | if patterns.size() != input.size() |
input | Strings instance for this operation |
patterns | Like patterns to match within each corresponding string |
escape_character | Optional character specifies the escape prefix. Default is no escape character. |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::strings::matches_re | ( | strings_column_view const & | input, |
regex_program const & | prog, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Returns a boolean column identifying rows which matching the given regex_program object but only at the beginning the string.
Any null string entries return corresponding null output column entries.
See the Regex Features page for details on patterns supported by this API.
input | Strings instance for this operation |
prog | Regex program instance |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |