Strings Extract#
- group strings_extract
Functions
-
std::unique_ptr<table> extract(strings_column_view const &input, regex_program const &prog, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns a table of strings columns where each column corresponds to the matching group specified in the given regex_program object.
All the strings for the first group will go in the first output column; the second group go in the second column and so on. Null entries are added to the columns in row
i
if the string at rowi
does not match.Any null string entries return corresponding null output column entries.
Example: s = ["a1", "b2", "c3"] p = regex_program::create("([ab])(\\d)") r = extract(s, p) r is now [ ["a", "b", null], ["1", "2", null] ]
See the Regex Features page for details on patterns supported by this API.
- Parameters:
input – Strings instance for this operation
prog – Regex program instance
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned table’s device memory
- Returns:
Columns of strings extracted from the input column
-
std::unique_ptr<column> extract_all_record(strings_column_view const &input, regex_program const &prog, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns a lists column of strings where each string column row corresponds to the matching group specified in the given regex_program object.
All the matching groups for the first row will go in the first row output column; the second row results will go into the second row output column and so on.
A null output row will result if the corresponding input string row does not match or that input row is null.
Example: s = ["a1 b4", "b2", "c3 a5", "b", null] p = regex_program::create("([ab])(\\d)") r = extract_all_record(s, p) r is now [ ["a", "1", "b", "4"], ["b", "2"], ["a", "5"], null, null ]
See the Regex Features page for details on patterns supported by this API.
- Parameters:
input – Strings instance for this operation
prog – Regex program instance
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate any returned device memory
- Returns:
Lists column containing strings extracted from the input column
-
std::unique_ptr<table> extract(strings_column_view const &input, regex_program const &prog, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#