Strings APIs#

group strings_apis

Functions

std::unique_ptr<column> count_characters(strings_column_view const &input, rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Returns a column containing character lengths of each string in the given column.

The output column will have the same number of rows as the specified strings column. Each row value will be the number of characters in the corresponding string.

Any null string will result in a null entry for that row in the output column.

Parameters:
  • input – Strings instance for this operation

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

New column with lengths for each string

std::unique_ptr<column> count_bytes(strings_column_view const &input, rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Returns a column containing byte lengths of each string in the given column.

The output column will have the same number of rows as the specified strings column. Each row value will be the number of bytes in the corresponding string.

Any null string will result in a null entry for that row in the output column.

Parameters:
  • input – Strings instance for this operation

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

New column with the number of bytes for each string

std::unique_ptr<column> code_points(strings_column_view const &input, rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Creates a numeric column with code point values (integers) for each character of each string.

A code point is the integer value representation of a character. For example, the code point value for the character ‘A’ in UTF-8 is 65.

The size of the output column will be the total number of characters in the strings column.

Any null string is ignored. No null entries will appear in the output column.

Parameters:
  • input – Strings instance for this operation

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

New INT32 column with code point integer values for each character