Strings Combine#
- group strings_combine
Enums
Functions
-
std::unique_ptr<column> join_strings(strings_column_view const &input, string_scalar const &separator = string_scalar(""), string_scalar const &narep = string_scalar("", false), rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Concatenates all strings in the column into one new string delimited by an optional separator string.
This returns a column with one string. Any null entries are ignored unless the
narep
parameter specifies a replacement string.Example: s = ['aa', null, '', 'zz' ] r = join_strings(s,':','_') r is ['aa:_::zz']
- Throws:
cudf::logic_error – if separator is not valid.
- Parameters:
input – Strings for this operation
separator – String that should inserted between each string. Default is an empty string.
narep – String to replace any null strings found. Default of invalid-scalar will ignore any null entries.
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory.
- Returns:
New column containing one string.
-
std::unique_ptr<column> concatenate(table_view const &strings_columns, strings_column_view const &separators, string_scalar const &separator_narep = string_scalar("", false), string_scalar const &col_narep = string_scalar("", false), separator_on_nulls separate_nulls = separator_on_nulls::YES, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Concatenates a list of strings columns using separators for each row and returns the result as a strings column.
Each new string is created by concatenating the strings from the same row delimited by the row separator provided for that row. The following rules are applicable:
If row separator for a given row is null, output column for that row is null, unless there is a valid
separator_narep
The separator is applied between two output row values if the
separate_nulls
isYES
or only between valid rows ifseparate_nulls
isNO
.If
separator_narep
andcol_narep
are both valid, the output column is always non nullable
Example: c0 = ['aa', null, '', 'ee', null, 'ff'] c1 = [null, 'cc', 'dd', null, null, 'gg'] c2 = ['bb', '', null, null, null, 'hh'] sep = ['::', '%%', '^^', '!', '*', null] out = concatenate({c0, c1, c2}, sep) // all rows have at least one null or sep[i]==null out is [null, null, null, null, null, null] sep_rep = '+' out = concatenate({c0, c1, c2}, sep, sep_rep) // all rows with at least one null output as null out is [null, null, null, null, null, 'ff+gg+hh'] col_narep = '-' sep_na = non-valid scalar out = concatenate({c0, c1, c2}, sep, sep_na, col_narep) // only the null entry in the sep column produces a null row out is ['aa::-::bb', '-%%cc%%', '^^dd^^-', 'ee!-!-', '-*-*-', null] col_narep = '' out = concatenate({c0, c1, c2}, sep, sep_rep, col_narep, separator_on_nulls:NO) // parameter suppresses separator for null rows out is ['aa::bb', 'cc%%', '^^dd', 'ee', '', 'ff+gg+hh']
- Throws:
cudf::logic_error – if no input columns are specified - table view is empty
cudf::logic_error – if input columns are not all strings columns.
cudf::logic_error – if the number of rows from
separators
andstrings_columns
do not match
- Parameters:
strings_columns – List of strings columns to concatenate
separators – Strings column that provides the separator for a given row
separator_narep – String to replace a null separator for a given row. Default of invalid-scalar means no row separator value replacements.
col_narep – String that should be used in place of any null strings found in any column. Default of invalid-scalar means no null column value replacements.
separate_nulls – If YES, then the separator is included for null rows if
col_narep
is valid.stream – CUDA stream used for device memory operations and kernel launches
mr – Resource for allocating device memory
- Returns:
New column with concatenated results
-
std::unique_ptr<column> concatenate(table_view const &strings_columns, string_scalar const &separator = string_scalar(""), string_scalar const &narep = string_scalar("", false), separator_on_nulls separate_nulls = separator_on_nulls::YES, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Row-wise concatenates the given list of strings columns and returns a single strings column result.
Each new string is created by concatenating the strings from the same row delimited by the separator provided.
Any row with a null entry will result in the corresponding output row to be null entry unless a narep string is specified to be used in its place.
If
separate_nulls
is set toNO
andnarep
is valid then separators are not added to the output between null elements. Otherwise, separators are always added ifnarep
is valid.More than one column must be specified in the input
strings_columns
table.Example: s1 = ['aa', null, '', 'dd'] s2 = ['', 'bb', 'cc', null] out = concatenate({s1, s2}) out is ['aa', null, 'cc', null] out = concatenate({s1, s2}, ':', '_') out is ['aa:', '_:bb', ':cc', 'dd:_'] out = concatenate({s1, s2}, ':', '', separator_on_nulls::NO) out is ['aa:', 'bb', ':cc', 'dd']
- Throws:
cudf::logic_error – if input columns are not all strings columns.
cudf::logic_error – if separator is not valid.
cudf::logic_error – if only one column is specified
- Parameters:
strings_columns – List of string columns to concatenate
separator – String that should inserted between each string from each row. Default is an empty string.
narep – String to replace any null strings found in any column. Default of invalid-scalar means any null entry in any column will produces a null result for that row.
separate_nulls – If YES, then the separator is included for null rows if
narep
is validstream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
New column with concatenated results
-
std::unique_ptr<column> join_list_elements(lists_column_view const &lists_strings_column, strings_column_view const &separators, string_scalar const &separator_narep = string_scalar("", false), string_scalar const &string_narep = string_scalar("", false), separator_on_nulls separate_nulls = separator_on_nulls::YES, output_if_empty_list empty_list_policy = output_if_empty_list::EMPTY_STRING, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Given a lists column of strings (each row is a list of strings), concatenates the strings within each row and returns a single strings column result.
Each new string is created by concatenating the strings from the same row (same list element) delimited by the row separator provided in the
separators
strings column.A null list row will always result in a null string in the output row. Any non-null list row having a null element will result in the corresponding output row to be null unless a valid
string_narep
scalar is provided to be used in its place. Any null row in theseparators
column will also result in a null output row unless a validseparator_narep
scalar is provided to be used in place of the null separators.If
separate_nulls
is set toNO
andstring_narep
is valid then separators are not added to the output between null elements. Otherwise, separators are always added ifstring_narep
is valid.If
empty_list_policy
is set toEMPTY_STRING
, any row that is an empty list will result in an empty output string. Otherwise, the output will be a null.In the special case when the input list row contains all null elements, the output will be the same as in case of empty input list regardless of
string_narep
andseparate_nulls
values.Example: s = [ ['aa', 'bb', 'cc'], null, ['', 'dd'], ['ee', null], ['ff', 'gg'] ] sep = ['::', '%%', '!', '*', null] out = join_list_elements(s, sep) out is ['aa::bb::cc', null, '!dd', null, null] out = join_list_elements(s, sep, ':', '_') out is ['aa::bb::cc', null, '!dd', 'ee*_', 'ff:gg'] out = join_list_elements(s, sep, ':', '', separator_on_nulls::NO) out is ['aa::bb::cc', null, '!dd', 'ee', 'ff:gg']
- Throws:
cudf::logic_error – if input column is not lists of strings column.
cudf::logic_error – if the number of rows from
separators
andlists_strings_column
do not match
- Parameters:
lists_strings_column – Column containing lists of strings to concatenate
separators – Strings column that provides separators for concatenation
separator_narep – String that should be used to replace a null separator. Default is an invalid-scalar denoting that rows containing null separator will result in a null string in the corresponding output rows.
string_narep – String to replace null strings in any non-null list row. Default is an invalid-scalar denoting that list rows containing null strings will result in a null string in the corresponding output rows.
separate_nulls – If YES, then the separator is included for null rows if
narep
is validempty_list_policy – If set to EMPTY_STRING, any input row that is an empty list will result in an empty string. Otherwise, it will result in a null.
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
New strings column with concatenated results
-
std::unique_ptr<column> join_list_elements(lists_column_view const &lists_strings_column, string_scalar const &separator = string_scalar(""), string_scalar const &narep = string_scalar("", false), separator_on_nulls separate_nulls = separator_on_nulls::YES, output_if_empty_list empty_list_policy = output_if_empty_list::EMPTY_STRING, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Given a lists column of strings (each row is a list of strings), concatenates the strings within each row and returns a single strings column result.
Each new string is created by concatenating the strings from the same row (same list element) delimited by the
separator
provided.A null list row will always result in a null string in the output row. Any non-null list row having a null element will result in the corresponding output row to be null unless a
narep
string is specified to be used in its place.If
separate_nulls
is set toNO
andnarep
is valid then separators are not added to the output between null elements. Otherwise, separators are always added ifnarep
is valid.If
empty_list_policy
is set toEMPTY_STRING
, any row that is an empty list will result in an empty output string. Otherwise, the output will be a null.In the special case when the input list row contains all null elements, the output will be the same as in case of empty input list regardless of
narep
andseparate_nulls
values.Example: s = [ ['aa', 'bb', 'cc'], null, ['', 'dd'], ['ee', null], ['ff'] ] out = join_list_elements(s) out is ['aabbcc', null, 'dd', null, 'ff'] out = join_list_elements(s, ':', '_') out is ['aa:bb:cc', null, ':dd', 'ee:_', 'ff'] out = join_list_elements(s, ':', '', separator_on_nulls::NO) out is ['aa:bb:cc', null, ':dd', 'ee', 'ff']
- Throws:
cudf::logic_error – if input column is not lists of strings column.
cudf::logic_error – if separator is not valid.
- Parameters:
lists_strings_column – Column containing lists of strings to concatenate
separator – String to insert between strings of each list row. Default is an empty string.
narep – String to replace null strings in any non-null list row. Default is an invalid-scalar denoting that list rows containing null strings will result in a null string in the corresponding output rows.
separate_nulls – If YES, then the separator is included for null rows if
narep
is validempty_list_policy – If set to EMPTY_STRING, any input row that is an empty list will result in an empty string. Otherwise, it will result in a null.
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
New strings column with concatenated results
-
std::unique_ptr<column> join_strings(strings_column_view const &input, string_scalar const &separator = string_scalar(""), string_scalar const &narep = string_scalar("", false), rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#