Strings Case#
- group strings_case
Functions
-
std::unique_ptr<column> capitalize(strings_column_view const &input, string_scalar const &delimiters = string_scalar("", true, cudf::get_default_stream()), rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns a column of capitalized strings.
If the
delimiters
is an empty string, then only the first character of each row is capitalized. Otherwise, a non-delimiter character is capitalized after any delimiter character is found.Example: input = ["tesT1", "a Test", "Another Test", "a\tb"]; output = capitalize(input) output is ["Test1", "A test", "Another test", "A\tb"] output = capitalize(input, " ") output is ["Test1", "A Test", "Another Test", "A\tb"] output = capitalize(input, " \t") output is ["Test1", "A Test", "Another Test", "A\tB"]
Any null string entries return corresponding null output column entries.
- Throws:
cudf::logic_error – if
delimiter.is_valid()
isfalse
.- Parameters:
input – String column
delimiters – Characters for identifying words to capitalize
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
Column of strings capitalized from the input column
-
std::unique_ptr<column> title(strings_column_view const &input, string_character_types sequence_type = string_character_types::ALPHA, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Modifies first character of each word to upper-case and lower-cases the rest.
A word here is a sequence of characters of
sequence_type
delimited by any characters not part of thesequence_type
character set.This function returns a column of strings where, for each string row in the input, the first character of each word is converted to upper-case, while all the remaining characters in a word are converted to lower-case.
Example: input = [" teST1", "a Test", " Another test ", "n2vidia"]; output = title(input) output is [" Test1", "A Test", " Another Test ", "N2Vidia"] output = title(input,ALPHANUM) output is [" Test1", "A Test", " Another Test ", "N2vidia"]
Any null string entries return corresponding null output column entries.
- Parameters:
input – String column
sequence_type – The character type that is used when identifying words
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
Column of titled strings
-
std::unique_ptr<column> is_title(strings_column_view const &input, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Checks if the strings in the input column are title formatted.
The first character of each word should be upper-case while all other characters should be lower-case. A word is a sequence of upper-case and lower-case characters.
This function returns a column of booleans indicating true if the string in the input row is in title format and false if not.
Example: input = [" Test1", "A Test", " Another test ", "N2Vidia Corp", "!Abc"]; output = is_title(input) output is [true, true, false, true, true]
Any null string entries result in corresponding null output column entries.
- Parameters:
input – String column
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
Column of type BOOL8
-
std::unique_ptr<column> to_lower(strings_column_view const &strings, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Converts a column of strings to lower case.
Only upper case alphabetical characters are converted. All other characters are copied. Case conversion may result in strings that are longer or shorter than the original string in bytes.
Any null entries create null entries in the output column.
- Parameters:
strings – Strings instance for this operation.
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory.
- Returns:
New column of strings with characters converted.
-
std::unique_ptr<column> to_upper(strings_column_view const &strings, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Converts a column of strings to upper case.
Only lower case alphabetical characters are converted. All other characters are copied. Case conversion may result in strings that are longer or shorter than the original string in bytes.
Any null entries create null entries in the output column.
- Parameters:
strings – Strings instance for this operation.
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory.
- Returns:
New column of strings with characters converted.
-
std::unique_ptr<column> swapcase(strings_column_view const &strings, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns a column of strings converting lower case characters to upper case and vice versa.
Only upper or lower case alphabetical characters are converted. All other characters are copied. Case conversion may result in strings that are longer or shorter than the original string in bytes.
Any null entries create null entries in the output column.
- Parameters:
strings – Strings instance for this operation.
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory.
- Returns:
New column of strings with characters converted.
-
std::unique_ptr<column> capitalize(strings_column_view const &input, string_scalar const &delimiters = string_scalar("", true, cudf::get_default_stream()), rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#