Files | |
file | strings/combine.hpp |
Strings APIs for concatenate and join. | |
Enumerations | |
enum class | cudf::strings::separator_on_nulls { cudf::strings::YES , cudf::strings::NO } |
Setting for specifying how separators are added with null strings elements. More... | |
enum class | cudf::strings::output_if_empty_list { cudf::strings::EMPTY_STRING , cudf::strings::NULL_ELEMENT } |
Setting for specifying what will be output from join_list_elements when an input list is empty. More... | |
|
strong |
Setting for specifying what will be output from join_list_elements
when an input list is empty.
Enumerator | |
---|---|
EMPTY_STRING | Empty list will result in empty string. |
NULL_ELEMENT | Empty list will result in a null. |
Definition at line 47 of file strings/combine.hpp.
|
strong |
Setting for specifying how separators are added with null strings elements.
Enumerator | |
---|---|
YES | Always add separators between elements. |
NO | Do not add separators if an element is null. |
Definition at line 38 of file strings/combine.hpp.
std::unique_ptr<column> cudf::strings::concatenate | ( | table_view const & | strings_columns, |
string_scalar const & | separator = string_scalar("") , |
||
string_scalar const & | narep = string_scalar("", false) , |
||
separator_on_nulls | separate_nulls = separator_on_nulls::YES , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Row-wise concatenates the given list of strings columns and returns a single strings column result.
Each new string is created by concatenating the strings from the same row delimited by the separator provided.
Any row with a null entry will result in the corresponding output row to be null entry unless a narep string is specified to be used in its place.
If separate_nulls
is set to NO
and narep
is valid then separators are not added to the output between null elements. Otherwise, separators are always added if narep
is valid.
More than one column must be specified in the input strings_columns
table.
cudf::logic_error | if input columns are not all strings columns. |
cudf::logic_error | if separator is not valid. |
cudf::logic_error | if only one column is specified |
strings_columns | List of string columns to concatenate |
separator | String that should inserted between each string from each row. Default is an empty string. |
narep | String to replace any null strings found in any column. Default of invalid-scalar means any null entry in any column will produces a null result for that row. |
separate_nulls | If YES, then the separator is included for null rows if narep is valid |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::strings::concatenate | ( | table_view const & | strings_columns, |
strings_column_view const & | separators, | ||
string_scalar const & | separator_narep = string_scalar("", false) , |
||
string_scalar const & | col_narep = string_scalar("", false) , |
||
separator_on_nulls | separate_nulls = separator_on_nulls::YES , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Concatenates a list of strings columns using separators for each row and returns the result as a strings column.
Each new string is created by concatenating the strings from the same row delimited by the row separator provided for that row. The following rules are applicable:
separator_narep
separate_nulls
is YES
or only between valid rows if separate_nulls
is NO
.separator_narep
and col_narep
are both valid, the output column is always non nullablecudf::logic_error | if no input columns are specified - table view is empty |
cudf::logic_error | if input columns are not all strings columns. |
cudf::logic_error | if the number of rows from separators and strings_columns do not match |
strings_columns | List of strings columns to concatenate |
separators | Strings column that provides the separator for a given row |
separator_narep | String to replace a null separator for a given row. Default of invalid-scalar means no row separator value replacements. |
col_narep | String that should be used in place of any null strings found in any column. Default of invalid-scalar means no null column value replacements. |
separate_nulls | If YES, then the separator is included for null rows if col_narep is valid. |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Resource for allocating device memory |
std::unique_ptr<column> cudf::strings::join_list_elements | ( | lists_column_view const & | lists_strings_column, |
string_scalar const & | separator = string_scalar("") , |
||
string_scalar const & | narep = string_scalar("", false) , |
||
separator_on_nulls | separate_nulls = separator_on_nulls::YES , |
||
output_if_empty_list | empty_list_policy = output_if_empty_list::EMPTY_STRING , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Given a lists column of strings (each row is a list of strings), concatenates the strings within each row and returns a single strings column result.
Each new string is created by concatenating the strings from the same row (same list element) delimited by the separator
provided.
A null list row will always result in a null string in the output row. Any non-null list row having a null element will result in the corresponding output row to be null unless a narep
string is specified to be used in its place.
If separate_nulls
is set to NO
and narep
is valid then separators are not added to the output between null elements. Otherwise, separators are always added if narep
is valid.
If empty_list_policy
is set to EMPTY_STRING
, any row that is an empty list will result in an empty output string. Otherwise, the output will be a null.
In the special case when the input list row contains all null elements, the output will be the same as in case of empty input list regardless of narep
and separate_nulls
values.
cudf::logic_error | if input column is not lists of strings column. |
cudf::logic_error | if separator is not valid. |
lists_strings_column | Column containing lists of strings to concatenate |
separator | String to insert between strings of each list row. Default is an empty string. |
narep | String to replace null strings in any non-null list row. Default is an invalid-scalar denoting that list rows containing null strings will result in a null string in the corresponding output rows. |
separate_nulls | If YES, then the separator is included for null rows if narep is valid |
empty_list_policy | If set to EMPTY_STRING, any input row that is an empty list will result in an empty string. Otherwise, it will result in a null. |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::strings::join_list_elements | ( | lists_column_view const & | lists_strings_column, |
strings_column_view const & | separators, | ||
string_scalar const & | separator_narep = string_scalar("", false) , |
||
string_scalar const & | string_narep = string_scalar("", false) , |
||
separator_on_nulls | separate_nulls = separator_on_nulls::YES , |
||
output_if_empty_list | empty_list_policy = output_if_empty_list::EMPTY_STRING , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Given a lists column of strings (each row is a list of strings), concatenates the strings within each row and returns a single strings column result.
Each new string is created by concatenating the strings from the same row (same list element) delimited by the row separator provided in the separators
strings column.
A null list row will always result in a null string in the output row. Any non-null list row having a null element will result in the corresponding output row to be null unless a valid string_narep
scalar is provided to be used in its place. Any null row in the separators
column will also result in a null output row unless a valid separator_narep
scalar is provided to be used in place of the null separators.
If separate_nulls
is set to NO
and string_narep
is valid then separators are not added to the output between null elements. Otherwise, separators are always added if string_narep
is valid.
If empty_list_policy
is set to EMPTY_STRING
, any row that is an empty list will result in an empty output string. Otherwise, the output will be a null.
In the special case when the input list row contains all null elements, the output will be the same as in case of empty input list regardless of string_narep
and separate_nulls
values.
cudf::logic_error | if input column is not lists of strings column. |
cudf::logic_error | if the number of rows from separators and lists_strings_column do not match |
lists_strings_column | Column containing lists of strings to concatenate |
separators | Strings column that provides separators for concatenation |
separator_narep | String that should be used to replace a null separator. Default is an invalid-scalar denoting that rows containing null separator will result in a null string in the corresponding output rows. |
string_narep | String to replace null strings in any non-null list row. Default is an invalid-scalar denoting that list rows containing null strings will result in a null string in the corresponding output rows. |
separate_nulls | If YES, then the separator is included for null rows if narep is valid |
empty_list_policy | If set to EMPTY_STRING, any input row that is an empty list will result in an empty string. Otherwise, it will result in a null. |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::strings::join_strings | ( | strings_column_view const & | input, |
string_scalar const & | separator = string_scalar("") , |
||
string_scalar const & | narep = string_scalar("", false) , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Concatenates all strings in the column into one new string delimited by an optional separator string.
This returns a column with one string. Any null entries are ignored unless the narep
parameter specifies a replacement string.
cudf::logic_error | if separator is not valid. |
input | Strings for this operation |
separator | String that should inserted between each string. Default is an empty string. |
narep | String to replace any null strings found. Default of invalid-scalar will ignore any null entries. |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory. |