Files | Functions
Extracting

Files

file  strings/extract.hpp
 

Functions

std::unique_ptr< tablecudf::strings::extract (strings_column_view const &input, regex_program const &prog, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns a table of strings columns where each column corresponds to the matching group specified in the given regex_program object. More...
 
std::unique_ptr< columncudf::strings::extract_all_record (strings_column_view const &input, regex_program const &prog, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns a lists column of strings where each string column row corresponds to the matching group specified in the given regex_program object. More...
 

Detailed Description

Function Documentation

◆ extract()

std::unique_ptr<table> cudf::strings::extract ( strings_column_view const &  input,
regex_program const &  prog,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns a table of strings columns where each column corresponds to the matching group specified in the given regex_program object.

All the strings for the first group will go in the first output column; the second group go in the second column and so on. Null entries are added to the columns in row i if the string at row i does not match.

Any null string entries return corresponding null output column entries.

Example:
s = ["a1", "b2", "c3"]
p = regex_program::create("([ab])(\\d)")
r = extract(s, p)
r is now [ ["a", "b", null],
["1", "2", null] ]

See the Regex Features page for details on patterns supported by this API.

Parameters
inputStrings instance for this operation
progRegex program instance
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table's device memory
Returns
Columns of strings extracted from the input column

◆ extract_all_record()

std::unique_ptr<column> cudf::strings::extract_all_record ( strings_column_view const &  input,
regex_program const &  prog,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns a lists column of strings where each string column row corresponds to the matching group specified in the given regex_program object.

All the matching groups for the first row will go in the first row output column; the second row results will go into the second row output column and so on.

A null output row will result if the corresponding input string row does not match or that input row is null.

Example:
s = ["a1 b4", "b2", "c3 a5", "b", null]
p = regex_program::create("([ab])(\\d)")
r = extract_all_record(s, p)
r is now [ ["a", "1", "b", "4"],
["b", "2"],
["a", "5"],
null,
null ]

See the Regex Features page for details on patterns supported by this API.

Parameters
inputStrings instance for this operation
progRegex program instance
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate any returned device memory
Returns
Lists column containing strings extracted from the input column