Enumerations | Functions
cudf::strings Namespace Reference

Strings column APIs. More...

Enumerations

enum  string_character_types : uint32_t {
  DECIMAL = 1 << 0, NUMERIC = 1 << 1, DIGIT = 1 << 2, ALPHA = 1 << 3,
  SPACE = 1 << 4, UPPER = 1 << 5, LOWER = 1 << 6, ALPHANUM = DECIMAL | NUMERIC | DIGIT | ALPHA,
  CASE_TYPES = UPPER | LOWER, ALL_TYPES = ALPHANUM | CASE_TYPES | SPACE
}
 Character type values. These types can be or'd to check for any combination of types. More...
 
enum  pad_side { pad_side::LEFT, pad_side::RIGHT, pad_side::BOTH }
 Pad types for the pad method specify where the pad character should be placed. More...
 
enum  strip_type { LEFT, RIGHT, BOTH }
 Direction identifier for strip() function.
 
enum  filter_type : bool { KEEP, REMOVE }
 Removes or keeps the specified character ranges in cudf::strings::filter_characters.
 

Functions

std::unique_ptr< columncount_characters (strings_column_view const &strings, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns an integer numeric column containing the length of each string in characters. More...
 
std::unique_ptr< columncount_bytes (strings_column_view const &strings, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a numeric column containing the length of each string in bytes. More...
 
std::unique_ptr< columncode_points (strings_column_view const &strings, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Creates a numeric column with code point values (integers) for each character of each string. More...
 
std::unique_ptr< columncapitalize (strings_column_view const &strings, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a column of capitalized strings. More...
 
std::unique_ptr< columntitle (strings_column_view const &strings, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Modifies first character after spaces to uppercase and lower-cases the rest. More...
 
std::unique_ptr< columnto_lower (strings_column_view const &strings, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Converts a column of strings to lower case. More...
 
std::unique_ptr< columnto_upper (strings_column_view const &strings, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Converts a column of strings to upper case. More...
 
std::unique_ptr< columnswapcase (strings_column_view const &strings, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a column of strings converting lower case characters to upper case and vice versa. More...
 
string_character_types operator| (string_character_types lhs, string_character_types rhs)
 
string_character_typesoperator|= (string_character_types &lhs, string_character_types rhs)
 
std::unique_ptr< columnall_characters_of_type (strings_column_view const &strings, string_character_types types, string_character_types verify_types=string_character_types::ALL_TYPES, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a boolean column identifying strings entries in which all characters are of the type specified. More...
 
std::unique_ptr< columnfilter_characters_of_type (strings_column_view const &strings, string_character_types types_to_remove, string_scalar const &replacement=string_scalar(""), string_character_types types_to_keep=string_character_types::ALL_TYPES, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Filter specific character types from a column of strings. More...
 
std::unique_ptr< columnconcatenate (table_view const &strings_columns, string_scalar const &separator=string_scalar(""), string_scalar const &narep=string_scalar("", false), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Row-wise concatenates the given list of strings columns and returns a single strings column result. More...
 
std::unique_ptr< columnjoin_strings (strings_column_view const &strings, string_scalar const &separator=string_scalar(""), string_scalar const &narep=string_scalar("", false), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Concatenates all strings in the column into one new string delimited by an optional separator string. More...
 
std::unique_ptr< columnconcatenate (table_view const &strings_columns, strings_column_view const &separators, string_scalar const &separator_narep=string_scalar("", false), string_scalar const &col_narep=string_scalar("", false), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Concatenates a list of strings columns using separators for each row and returns the result as a strings column. More...
 
std::unique_ptr< columncontains_re (strings_column_view const &strings, std::string const &pattern, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a boolean column identifying rows which match the given regex pattern. More...
 
std::unique_ptr< columnmatches_re (strings_column_view const &strings, std::string const &pattern, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a boolean column identifying rows which matching the given regex pattern but only at the beginning the string. More...
 
std::unique_ptr< columncount_re (strings_column_view const &strings, std::string const &pattern, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns the number of times the given regex pattern matches in each string. More...
 
std::unique_ptr< columnto_booleans (strings_column_view const &strings, string_scalar const &true_string=string_scalar("true"), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a new BOOL8 column by parsing boolean values from the strings in the provided strings column. More...
 
std::unique_ptr< columnfrom_booleans (column_view const &booleans, string_scalar const &true_string=string_scalar("true"), string_scalar const &false_string=string_scalar("false"), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a new strings column converting the boolean values from the provided column into strings. More...
 
std::unique_ptr< columnto_timestamps (strings_column_view const &strings, data_type timestamp_type, std::string const &format, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a new timestamp column converting a strings column into timestamps using the provided format pattern. More...
 
std::unique_ptr< columnis_timestamp (strings_column_view const &strings, std::string const &format, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Verifies the given strings column can be parsed to timestamps using the provided format pattern. More...
 
std::unique_ptr< columnfrom_timestamps (column_view const &timestamps, std::string const &format="%Y-%m-%dT%H:%M:%SZ", rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a new strings column converting a timestamp column into strings using the provided format pattern. More...
 
std::unique_ptr< columnto_durations (strings_column_view const &strings, data_type duration_type, std::string const &format, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a new duration column converting a strings column into durations using the provided format pattern. More...
 
std::unique_ptr< columnfrom_durations (column_view const &durations, std::string const &format="%D days %H:%M:%S", rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a new strings column converting a duration column into strings using the provided format pattern. More...
 
std::unique_ptr< columnto_fixed_point (strings_column_view const &input, data_type output_type, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a new fixed-point column parsing decimal values from the provided strings column. More...
 
std::unique_ptr< columnfrom_fixed_point (column_view const &input, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a new strings column converting the fixed-point values into a strings column. More...
 
std::unique_ptr< columnis_fixed_point (strings_column_view const &input, data_type decimal_type=data_type{type_id::DECIMAL64}, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a boolean column identifying strings in which all characters are valid for conversion to fixed-point. More...
 
std::unique_ptr< columnto_floats (strings_column_view const &strings, data_type output_type, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a new numeric column by parsing float values from each string in the provided strings column. More...
 
std::unique_ptr< columnfrom_floats (column_view const &floats, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a new strings column converting the float values from the provided column into strings. More...
 
std::unique_ptr< columnis_float (strings_column_view const &strings, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a boolean column identifying strings in which all characters are valid for conversion to floats. More...
 
std::unique_ptr< columnto_integers (strings_column_view const &strings, data_type output_type, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a new integer numeric column parsing integer values from the provided strings column. More...
 
std::unique_ptr< columnfrom_integers (column_view const &integers, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a new strings column converting the integer values from the provided column into strings. More...
 
std::unique_ptr< columnis_integer (strings_column_view const &strings, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a boolean column identifying strings in which all characters are valid for conversion to integers. More...
 
std::unique_ptr< columnis_integer (strings_column_view const &strings, data_type int_type, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a boolean column identifying strings in which all characters are valid for conversion to integers. More...
 
std::unique_ptr< columnhex_to_integers (strings_column_view const &strings, data_type output_type, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a new integer numeric column parsing hexadecimal values from the provided strings column. More...
 
std::unique_ptr< columnis_hex (strings_column_view const &strings, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a boolean column identifying strings in which all characters are valid for conversion to integers from hex. More...
 
std::unique_ptr< columnipv4_to_integers (strings_column_view const &strings, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Converts IPv4 addresses into integers. More...
 
std::unique_ptr< columnintegers_to_ipv4 (column_view const &integers, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Converts integers into IPv4 addresses as strings. More...
 
std::unique_ptr< columnis_ipv4 (strings_column_view const &strings, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a boolean column identifying strings in which all characters are valid for conversion to integers from IPv4 format. More...
 
std::unique_ptr< columnurl_encode (strings_column_view const &strings, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Decodes each string using URL encoding. More...
 
std::unique_ptr< columnurl_decode (strings_column_view const &strings, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Encodes each string using URL encoding. More...
 
std::unique_ptr< tableextract (strings_column_view const &strings, std::string const &pattern, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a vector of strings columns for each matching group specified in the given regular expression pattern. More...
 
std::unique_ptr< columnfind (strings_column_view const &strings, string_scalar const &target, size_type start=0, size_type stop=-1, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a column of character position values where the target string is first found in each string of the provided column. More...
 
std::unique_ptr< columnrfind (strings_column_view const &strings, string_scalar const &target, size_type start=0, size_type stop=-1, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a column of character position values where the target string is first found searching from the end of each string. More...
 
std::unique_ptr< columncontains (strings_column_view const &strings, string_scalar const &target, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a column of boolean values for each string where true indicates the target string was found within that string in the provided column. More...
 
std::unique_ptr< columncontains (strings_column_view const &strings, strings_column_view const &targets, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a column of boolean values for each string where true indicates the corresponding target string was found within that string in the provided column. More...
 
std::unique_ptr< columnstarts_with (strings_column_view const &strings, string_scalar const &target, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a column of boolean values for each string where true indicates the target string was found at the beginning of that string in the provided column. More...
 
std::unique_ptr< columnstarts_with (strings_column_view const &strings, strings_column_view const &targets, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a column of boolean values for each string where true indicates corresponding string in target column was found at the beginning of that string in the provided column. More...
 
std::unique_ptr< columnends_with (strings_column_view const &strings, string_scalar const &target, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a column of boolean values for each string where true indicates the target string was found at the end of that string in the provided column. More...
 
std::unique_ptr< columnends_with (strings_column_view const &strings, strings_column_view const &targets, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a column of boolean values for each string where true indicates corresponding string in target column was found at the end of that string in the provided column. More...
 
std::unique_ptr< columnfind_multiple (strings_column_view const &strings, strings_column_view const &targets, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a column with character position values where each of the target strings are found in each string. More...
 
std::unique_ptr< tablefindall_re (strings_column_view const &strings, std::string const &pattern, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a table of strings columns for each matching occurrence of the regex pattern within each string. More...
 
std::unique_ptr< cudf::columnget_json_object (cudf::strings_column_view const &col, cudf::string_scalar const &json_path, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Apply a JSONPath string to all rows in an input strings column. More...
 
std::unique_ptr< columnpad (strings_column_view const &strings, size_type width, pad_side side=cudf::strings::pad_side::RIGHT, std::string const &fill_char=" ", rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Add padding to each string using a provided character. More...
 
std::unique_ptr< columnzfill (strings_column_view const &strings, size_type width, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Add '0' as padding to the left of each string. More...
 
std::unique_ptr< columnreplace (strings_column_view const &strings, string_scalar const &target, string_scalar const &repl, int32_t maxrepl=-1, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Replaces target string within each string with the specified replacement string. More...
 
std::unique_ptr< columnreplace_slice (strings_column_view const &strings, string_scalar const &repl=string_scalar(""), size_type start=0, size_type stop=-1, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 This function replaces each string in the column with the provided repl string within the [start,stop) character position range. More...
 
std::unique_ptr< columnreplace (strings_column_view const &strings, strings_column_view const &targets, strings_column_view const &repls, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Replaces substrings matching a list of targets with the corresponding replacement strings. More...
 
std::unique_ptr< columnreplace_nulls (strings_column_view const &strings, string_scalar const &repl=string_scalar(""), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Replaces any null string entries with the given string. More...
 
std::unique_ptr< columnreplace_re (strings_column_view const &strings, std::string const &pattern, string_scalar const &repl=string_scalar(""), size_type maxrepl=-1, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 For each string, replaces any character sequence matching the given pattern with the provided replacement string. More...
 
std::unique_ptr< columnreplace_re (strings_column_view const &strings, std::vector< std::string > const &patterns, strings_column_view const &repls, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 For each string, replaces any character sequence matching the given patterns with the corresponding string in the repls column. More...
 
std::unique_ptr< columnreplace_with_backrefs (strings_column_view const &strings, std::string const &pattern, std::string const &repl, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 For each string, replaces any character sequence matching the given pattern using the repl template for back-references. More...
 
std::unique_ptr< tablepartition (strings_column_view const &strings, string_scalar const &delimiter=string_scalar(""), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a set of 3 columns by splitting each string using the specified delimiter. More...
 
std::unique_ptr< tablerpartition (strings_column_view const &strings, string_scalar const &delimiter=string_scalar(""), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a set of 3 columns by splitting each string using the specified delimiter starting from the end of each string. More...
 
std::unique_ptr< tablesplit (strings_column_view const &strings_column, string_scalar const &delimiter=string_scalar(""), size_type maxsplit=-1, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a list of columns by splitting each string using the specified delimiter. More...
 
std::unique_ptr< tablersplit (strings_column_view const &strings_column, string_scalar const &delimiter=string_scalar(""), size_type maxsplit=-1, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a list of columns by splitting each string using the specified delimiter starting from the end of each string. More...
 
std::unique_ptr< columnsplit_record (strings_column_view const &strings, string_scalar const &delimiter=string_scalar(""), size_type maxsplit=-1, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Splits individual strings elements into a list of strings. More...
 
std::unique_ptr< columnrsplit_record (strings_column_view const &strings, string_scalar const &delimiter=string_scalar(""), size_type maxsplit=-1, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Splits individual strings elements into a list of strings starting from the end of each string. More...
 
void print (strings_column_view const &strings, size_type start=0, size_type end=-1, size_type max_width=-1, const char *delimiter="\n")
 Prints the strings to stdout. More...
 
std::pair< rmm::device_vector< char >, rmm::device_vector< size_type > > create_offsets (strings_column_view const &strings, rmm::cuda_stream_view stream=rmm::cuda_stream_default, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Create output per Arrow strings format. More...
 
std::unique_ptr< columnstrip (strings_column_view const &strings, strip_type stype=strip_type::BOTH, string_scalar const &to_strip=string_scalar(""), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Removes the specified characters from the beginning or end (or both) of each string. More...
 
std::unique_ptr< columnslice_strings (strings_column_view const &strings, numeric_scalar< size_type > const &start=numeric_scalar< size_type >(0, false), numeric_scalar< size_type > const &stop=numeric_scalar< size_type >(0, false), numeric_scalar< size_type > const &step=numeric_scalar< size_type >(1), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a new strings column that contains substrings of the strings in the provided column. More...
 
std::unique_ptr< columnslice_strings (strings_column_view const &strings, column_view const &starts, column_view const &stops, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a new strings column that contains substrings of the strings in the provided column using unique ranges for each string. More...
 
std::unique_ptr< columnslice_strings (strings_column_view const &strings, string_scalar const &delimiter, size_type count, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Slices a column of strings by using a delimiter as a slice point. More...
 
std::unique_ptr< columnslice_strings (strings_column_view const &strings, strings_column_view const &delimiter_strings, size_type count, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Slices a column of strings by using a delimiter column as slice points. More...
 
std::unique_ptr< columntranslate (strings_column_view const &strings, std::vector< std::pair< char_utf8, char_utf8 >> const &chars_table, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Translates individual characters within each string. More...
 
std::unique_ptr< columnfilter_characters (strings_column_view const &strings, std::vector< std::pair< cudf::char_utf8, cudf::char_utf8 >> characters_to_filter, filter_type keep_characters=filter_type::KEEP, string_scalar const &replacement=string_scalar(""), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Removes ranges of characters from each string in a strings column. More...
 
std::unique_ptr< columnwrap (strings_column_view const &strings, size_type width, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Wraps strings onto multiple lines shorter than width by replacing appropriate white space with new-line characters (ASCII 0x0A). More...
 

Detailed Description

Strings column APIs.

Function Documentation

◆ create_offsets()

std::pair<rmm::device_vector<char>, rmm::device_vector<size_type> > cudf::strings::create_offsets ( strings_column_view const &  strings,
rmm::cuda_stream_view  stream = rmm::cuda_stream_default,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Create output per Arrow strings format.

The return pair is the vector of chars and the vector of offsets.

Parameters
stringsStrings instance for this operation.
streamCUDA stream used for device memory operations and kernel launches.
mrDevice memory resource used to allocate the returned device_vectors.
Returns
Pair containing a vector of chars and a vector of offsets.

◆ print()

void cudf::strings::print ( strings_column_view const &  strings,
size_type  start = 0,
size_type  end = -1,
size_type  max_width = -1,
const char *  delimiter = "\n" 
)

Prints the strings to stdout.

Parameters
stringsStrings instance for this operation.
startIndex of first string to print.
endIndex of last string to print. Specify -1 for all strings.
max_widthMaximum number of characters to print per string. Specify -1 to print all characters.
delimiterThe chars to print between each string. Default is new-line character.