Modules | Files | Functions
Strings

Modules

 Case
 
 Character Types
 
 Combining
 
 Searching
 
 Converting
 
 Copying
 
 Slicing
 
 Finding
 
 Modifying
 
 Replacing
 
 Splitting
 
 Extracting
 
 Regex
 

Files

file  attributes.hpp
 Read attributes of strings column.
 

Functions

std::unique_ptr< columncudf::strings::count_characters (strings_column_view const &input, rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource())
 Returns a column containing character lengths of each string in the given column. More...
 
std::unique_ptr< columncudf::strings::count_bytes (strings_column_view const &input, rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource())
 Returns a column containing byte lengths of each string in the given column. More...
 
std::unique_ptr< columncudf::strings::code_points (strings_column_view const &input, rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource())
 Creates a numeric column with code point values (integers) for each character of each string. More...
 

Detailed Description

Function Documentation

◆ code_points()

std::unique_ptr<column> cudf::strings::code_points ( strings_column_view const &  input,
rmm::device_async_resource_ref  mr = rmm::mr::get_current_device_resource() 
)

Creates a numeric column with code point values (integers) for each character of each string.

A code point is the integer value representation of a character. For example, the code point value for the character 'A' in UTF-8 is 65.

The size of the output column will be the total number of characters in the strings column.

Any null string is ignored. No null entries will appear in the output column.

Parameters
inputStrings instance for this operation
mrDevice memory resource used to allocate the returned column's device memory
Returns
New INT32 column with code point integer values for each character

◆ count_bytes()

std::unique_ptr<column> cudf::strings::count_bytes ( strings_column_view const &  input,
rmm::device_async_resource_ref  mr = rmm::mr::get_current_device_resource() 
)

Returns a column containing byte lengths of each string in the given column.

The output column will have the same number of rows as the specified strings column. Each row value will be the number of bytes in the corresponding string.

Any null string will result in a null entry for that row in the output column.

Parameters
inputStrings instance for this operation
mrDevice memory resource used to allocate the returned column's device memory
Returns
New column with the number of bytes for each string

◆ count_characters()

std::unique_ptr<column> cudf::strings::count_characters ( strings_column_view const &  input,
rmm::device_async_resource_ref  mr = rmm::mr::get_current_device_resource() 
)

Returns a column containing character lengths of each string in the given column.

The output column will have the same number of rows as the specified strings column. Each row value will be the number of characters in the corresponding string.

Any null string will result in a null entry for that row in the output column.

Parameters
inputStrings instance for this operation
mrDevice memory resource used to allocate the returned column's device memory
Returns
New column with lengths for each string