Strings Combine#
- group Combining
Enums
Functions
-
std::unique_ptr<column> join_strings(strings_column_view const &input, string_scalar const &separator = string_scalar(""), string_scalar const &narep = string_scalar("", false), rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Concatenates all strings in the column into one new string delimited by an optional separator string.
This returns a column with one string. Any null entries are ignored unless the
narepparameter specifies a replacement string.Example: s = ['aa', null, '', 'zz' ] r = join_strings(s,':','_') r is ['aa:_::zz']
- Throws:
cudf::logic_error – if separator is not valid.
- Parameters:
input – Strings for this operation
separator – String that should inserted between each string. Default is an empty string.
narep – String to replace any null strings found. Default of invalid-scalar will ignore any null entries.
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory.
- Returns:
New column containing one string.
-
std::unique_ptr<column> concatenate(table_view const &strings_columns, strings_column_view const &separators, string_scalar const &separator_narep = string_scalar("", false), string_scalar const &col_narep = string_scalar("", false), separator_on_nulls separate_nulls = separator_on_nulls::YES, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Concatenates a list of strings columns using separators for each row and returns the result as a strings column.
Each new string is created by concatenating the strings from the same row delimited by the row separator provided for that row. The following rules are applicable:
If row separator for a given row is null, output column for that row is null, unless there is a valid
separator_narepThe separator is applied between two output row values if the
separate_nullsisYESor only between valid rows ifseparate_nullsisNO.If
separator_narepandcol_narepare both valid, the output column is always non nullable
Example: c0 = ['aa', null, '', 'ee', null, 'ff'] c1 = [null, 'cc', 'dd', null, null, 'gg'] c2 = ['bb', '', null, null, null, 'hh'] sep = ['::', '%%', '^^', '!', '*', null] out = concatenate({c0, c1, c2}, sep) // all rows have at least one null or sep[i]==null out is [null, null, null, null, null, null] sep_rep = '+' out = concatenate({c0, c1, c2}, sep, sep_rep) // all rows with at least one null output as null out is [null, null, null, null, null, 'ff+gg+hh'] col_narep = '-' sep_na = non-valid scalar out = concatenate({c0, c1, c2}, sep, sep_na, col_narep) // only the null entry in the sep column produces a null row out is ['aa::-::bb', '-%%cc%%', '^^dd^^-', 'ee!-!-', '-*-*-', null] col_narep = '' out = concatenate({c0, c1, c2}, sep, sep_rep, col_narep, separator_on_nulls:NO) // parameter suppresses separator for null rows out is ['aa::bb', 'cc%%', '^^dd', 'ee', '', 'ff+gg+hh']- Throws:
cudf::logic_error – if no input columns are specified - table view is empty
cudf::logic_error – if input columns are not all strings columns.
cudf::logic_error – if the number of rows from
separatorsandstrings_columnsdo not match
- Parameters:
strings_columns – List of strings columns to concatenate
separators – Strings column that provides the separator for a given row
separator_narep – String to replace a null separator for a given row. Default of invalid-scalar means no row separator value replacements.
col_narep – String that should be used in place of any null strings found in any column. Default of invalid-scalar means no null column value replacements.
separate_nulls – If YES, then the separator is included for null rows if
col_narepis valid.stream – CUDA stream used for device memory operations and kernel launches
mr – Resource for allocating device memory
- Returns:
New column with concatenated results
-
std::unique_ptr<column> concatenate(table_view const &strings_columns, string_scalar const &separator = string_scalar(""), string_scalar const &narep = string_scalar("", false), separator_on_nulls separate_nulls = separator_on_nulls::YES, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Row-wise concatenates the given list of strings columns and returns a single strings column result.
Each new string is created by concatenating the strings from the same row delimited by the separator provided.
Any row with a null entry will result in the corresponding output row to be null entry unless a narep string is specified to be used in its place.
If
separate_nullsis set toNOandnarepis valid then separators are not added to the output between null elements. Otherwise, separators are always added ifnarepis valid.More than one column must be specified in the input
strings_columnstable.Example: s1 = ['aa', null, '', 'dd'] s2 = ['', 'bb', 'cc', null] out = concatenate({s1, s2}) out is ['aa', null, 'cc', null] out = concatenate({s1, s2}, ':', '_') out is ['aa:', '_:bb', ':cc', 'dd:_'] out = concatenate({s1, s2}, ':', '', separator_on_nulls::NO) out is ['aa:', 'bb', ':cc', 'dd']- Throws:
cudf::logic_error – if input columns are not all strings columns.
cudf::logic_error – if separator is not valid.
cudf::logic_error – if only one column is specified
- Parameters:
strings_columns – List of string columns to concatenate
separator – String that should inserted between each string from each row. Default is an empty string.
narep – String to replace any null strings found in any column. Default of invalid-scalar means any null entry in any column will produces a null result for that row.
separate_nulls – If YES, then the separator is included for null rows if
narepis validstream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
New column with concatenated results
-
std::unique_ptr<column> join_list_elements(lists_column_view const &lists_strings_column, strings_column_view const &separators, string_scalar const &separator_narep = string_scalar("", false), string_scalar const &string_narep = string_scalar("", false), separator_on_nulls separate_nulls = separator_on_nulls::YES, output_if_empty_list empty_list_policy = output_if_empty_list::EMPTY_STRING, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Given a lists column of strings (each row is a list of strings), concatenates the strings within each row and returns a single strings column result.
Each new string is created by concatenating the strings from the same row (same list element) delimited by the row separator provided in the
separatorsstrings column.A null list row will always result in a null string in the output row. Any non-null list row having a null element will result in the corresponding output row to be null unless a valid
string_narepscalar is provided to be used in its place. Any null row in theseparatorscolumn will also result in a null output row unless a validseparator_narepscalar is provided to be used in place of the null separators.If
separate_nullsis set toNOandstring_narepis valid then separators are not added to the output between null elements. Otherwise, separators are always added ifstring_narepis valid.If
empty_list_policyis set toEMPTY_STRING, any row that is an empty list will result in an empty output string. Otherwise, the output will be a null.In the special case when the input list row contains all null elements, the output will be the same as in case of empty input list regardless of
string_narepandseparate_nullsvalues.Example: s = [ ['aa', 'bb', 'cc'], null, ['', 'dd'], ['ee', null], ['ff', 'gg'] ] sep = ['::', '%%', '!', '*', null] out = join_list_elements(s, sep) out is ['aa::bb::cc', null, '!dd', null, null] out = join_list_elements(s, sep, ':', '_') out is ['aa::bb::cc', null, '!dd', 'ee*_', 'ff:gg'] out = join_list_elements(s, sep, ':', '', separator_on_nulls::NO) out is ['aa::bb::cc', null, '!dd', 'ee', 'ff:gg']
- Throws:
cudf::logic_error – if input column is not lists of strings column.
cudf::logic_error – if the number of rows from
separatorsandlists_strings_columndo not match
- Parameters:
lists_strings_column – Column containing lists of strings to concatenate
separators – Strings column that provides separators for concatenation
separator_narep – String that should be used to replace a null separator. Default is an invalid-scalar denoting that rows containing null separator will result in a null string in the corresponding output rows.
string_narep – String to replace null strings in any non-null list row. Default is an invalid-scalar denoting that list rows containing null strings will result in a null string in the corresponding output rows.
separate_nulls – If YES, then the separator is included for null rows if
narepis validempty_list_policy – If set to EMPTY_STRING, any input row that is an empty list will result in an empty string. Otherwise, it will result in a null.
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
New strings column with concatenated results
-
std::unique_ptr<column> join_list_elements(lists_column_view const &lists_strings_column, string_scalar const &separator = string_scalar(""), string_scalar const &narep = string_scalar("", false), separator_on_nulls separate_nulls = separator_on_nulls::YES, output_if_empty_list empty_list_policy = output_if_empty_list::EMPTY_STRING, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Given a lists column of strings (each row is a list of strings), concatenates the strings within each row and returns a single strings column result.
Each new string is created by concatenating the strings from the same row (same list element) delimited by the
separatorprovided.A null list row will always result in a null string in the output row. Any non-null list row having a null element will result in the corresponding output row to be null unless a
narepstring is specified to be used in its place.If
separate_nullsis set toNOandnarepis valid then separators are not added to the output between null elements. Otherwise, separators are always added ifnarepis valid.If
empty_list_policyis set toEMPTY_STRING, any row that is an empty list will result in an empty output string. Otherwise, the output will be a null.In the special case when the input list row contains all null elements, the output will be the same as in case of empty input list regardless of
narepandseparate_nullsvalues.Example: s = [ ['aa', 'bb', 'cc'], null, ['', 'dd'], ['ee', null], ['ff'] ] out = join_list_elements(s) out is ['aabbcc', null, 'dd', null, 'ff'] out = join_list_elements(s, ':', '_') out is ['aa:bb:cc', null, ':dd', 'ee:_', 'ff'] out = join_list_elements(s, ':', '', separator_on_nulls::NO) out is ['aa:bb:cc', null, ':dd', 'ee', 'ff']
- Throws:
cudf::logic_error – if input column is not lists of strings column.
cudf::logic_error – if separator is not valid.
- Parameters:
lists_strings_column – Column containing lists of strings to concatenate
separator – String to insert between strings of each list row. Default is an empty string.
narep – String to replace null strings in any non-null list row. Default is an invalid-scalar denoting that list rows containing null strings will result in a null string in the corresponding output rows.
separate_nulls – If YES, then the separator is included for null rows if
narepis validempty_list_policy – If set to EMPTY_STRING, any input row that is an empty list will result in an empty string. Otherwise, it will result in a null.
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
New strings column with concatenated results
-
std::unique_ptr<column> join_strings(strings_column_view const &input, string_scalar const &separator = string_scalar(""), string_scalar const &narep = string_scalar("", false), rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#