Strings Case#

group strings_case

Functions

std::unique_ptr<column> capitalize(strings_column_view const &input, string_scalar const &delimiters = string_scalar("", true, cudf::get_default_stream()), rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a column of capitalized strings.

If the delimiters is an empty string, then only the first character of each row is capitalized. Otherwise, a non-delimiter character is capitalized after any delimiter character is found.

Example:
input = ["tesT1", "a Test", "Another Test", "a\tb"];
output = capitalize(input)
output is ["Test1", "A test", "Another test", "A\tb"]
output = capitalize(input, " ")
output is ["Test1", "A Test", "Another Test", "A\tb"]
output = capitalize(input, " \t")
output is ["Test1", "A Test", "Another Test", "A\tB"]

Any null string entries return corresponding null output column entries.

Throws:

cudf::logic_error – if delimiter.is_valid() is false.

Parameters:
  • input – String column

  • delimiters – Characters for identifying words to capitalize

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

Column of strings capitalized from the input column

std::unique_ptr<column> title(strings_column_view const &input, string_character_types sequence_type = string_character_types::ALPHA, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Modifies first character of each word to upper-case and lower-cases the rest.

A word here is a sequence of characters of sequence_type delimited by any characters not part of the sequence_type character set.

This function returns a column of strings where, for each string row in the input, the first character of each word is converted to upper-case, while all the remaining characters in a word are converted to lower-case.

Example:
input = ["   teST1", "a Test", " Another test ", "n2vidia"];
output = title(input)
output is ["   Test1", "A Test", " Another Test ", "N2Vidia"]
output = title(input,ALPHANUM)
output is ["   Test1", "A Test", " Another Test ", "N2vidia"]

Any null string entries return corresponding null output column entries.

Parameters:
  • input – String column

  • sequence_type – The character type that is used when identifying words

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

Column of titled strings

std::unique_ptr<column> is_title(strings_column_view const &input, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Checks if the strings in the input column are title formatted.

The first character of each word should be upper-case while all other characters should be lower-case. A word is a sequence of upper-case and lower-case characters.

This function returns a column of booleans indicating true if the string in the input row is in title format and false if not.

Example:
input = ["   Test1", "A Test", " Another test ", "N2Vidia Corp", "!Abc"];
output = is_title(input)
output is [true, true, false, true, true]

Any null string entries result in corresponding null output column entries.

Parameters:
  • input – String column

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

Column of type BOOL8

std::unique_ptr<column> to_lower(strings_column_view const &strings, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Converts a column of strings to lower case.

Only upper case alphabetical characters are converted. All other characters are copied. Case conversion may result in strings that are longer or shorter than the original string in bytes.

Any null entries create null entries in the output column.

Parameters:
  • strings – Strings instance for this operation.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory.

Returns:

New column of strings with characters converted.

std::unique_ptr<column> to_upper(strings_column_view const &strings, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Converts a column of strings to upper case.

Only lower case alphabetical characters are converted. All other characters are copied. Case conversion may result in strings that are longer or shorter than the original string in bytes.

Any null entries create null entries in the output column.

Parameters:
  • strings – Strings instance for this operation.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory.

Returns:

New column of strings with characters converted.

std::unique_ptr<column> swapcase(strings_column_view const &strings, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a column of strings converting lower case characters to upper case and vice versa.

Only upper or lower case alphabetical characters are converted. All other characters are copied. Case conversion may result in strings that are longer or shorter than the original string in bytes.

Any null entries create null entries in the output column.

Parameters:
  • strings – Strings instance for this operation.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory.

Returns:

New column of strings with characters converted.