Files | Enumerations | Functions
Character Types

Files

file  char_types.hpp
 

Enumerations

enum  cudf::strings::string_character_types : uint32_t {
  cudf::strings::DECIMAL = 1 << 0, cudf::strings::NUMERIC = 1 << 1, cudf::strings::DIGIT = 1 << 2, cudf::strings::ALPHA = 1 << 3,
  cudf::strings::SPACE = 1 << 4, cudf::strings::UPPER = 1 << 5, cudf::strings::LOWER = 1 << 6, cudf::strings::ALPHANUM = DECIMAL | NUMERIC | DIGIT | ALPHA,
  cudf::strings::CASE_TYPES = UPPER | LOWER, cudf::strings::ALL_TYPES = ALPHANUM | CASE_TYPES | SPACE
}
 Character type values. These types can be or'd to check for any combination of types. More...
 

Functions

string_character_types cudf::strings::operator| (string_character_types lhs, string_character_types rhs)
 OR operator for combining string_character_types.
 
string_character_typescudf::strings::operator|= (string_character_types &lhs, string_character_types rhs)
 Compound assignment OR operator for combining string_character_types.
 
std::unique_ptr< columncudf::strings::all_characters_of_type (strings_column_view const &strings, string_character_types types, string_character_types verify_types=string_character_types::ALL_TYPES, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a boolean column identifying strings entries in which all characters are of the type specified. More...
 
std::unique_ptr< columncudf::strings::filter_characters_of_type (strings_column_view const &strings, string_character_types types_to_remove, string_scalar const &replacement=string_scalar(""), string_character_types types_to_keep=string_character_types::ALL_TYPES, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Filter specific character types from a column of strings. More...
 

Detailed Description

Enumeration Type Documentation

◆ string_character_types

Character type values. These types can be or'd to check for any combination of types.

This cannot be turned into an enum class because or'd entries can result in values that are not in the class. For example, combining NUMERIC|SPACE is a valid, reasonable combination but does not match to any explicitly named enumerator.

Enumerator
DECIMAL 

all decimal characters

NUMERIC 

all numeric characters

DIGIT 

all digit characters

ALPHA 

all alphabetic characters

SPACE 

all space characters

UPPER 

all upper case characters

LOWER 

all lower case characters

ALPHANUM 

all alphanumeric characters

CASE_TYPES 

all case-able characters

ALL_TYPES 

all character types

Definition at line 39 of file char_types.hpp.

Function Documentation

◆ all_characters_of_type()

std::unique_ptr<column> cudf::strings::all_characters_of_type ( strings_column_view const &  strings,
string_character_types  types,
string_character_types  verify_types = string_character_types::ALL_TYPES,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Returns a boolean column identifying strings entries in which all characters are of the type specified.

The output row entry will be set to false if the corresponding string element is empty or has at least one character not of the specified type. If all characters fit the type then true is set in that output row entry.

To ignore all but specific types, set the verify_types to those types which should be checked. Otherwise, the default ALL_TYPES will verify all characters match types.

Example:
s = ['ab', 'a b', 'a7', 'a B']
b1 = s.all_characters_of_type(s,LOWER)
b1 is [true, false, false, false]
b2 = s.all_characters_of_type(s,LOWER,LOWER|UPPER)
b2 is [true, true, true, false]

Any null row results in a null entry for that row in the output column.

Parameters
stringsStrings instance for this operation.
typesThe character types to check in each string.
verify_typesOnly verify against these character types. Default ALL_TYPES means return true iff all characters match types.
mrDevice memory resource used to allocate the returned column's device memory.
Returns
New column of boolean results for each string.

◆ filter_characters_of_type()

std::unique_ptr<column> cudf::strings::filter_characters_of_type ( strings_column_view const &  strings,
string_character_types  types_to_remove,
string_scalar const &  replacement = string_scalar(""),
string_character_types  types_to_keep = string_character_types::ALL_TYPES,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Filter specific character types from a column of strings.

To remove all characters of a specific type, set that type in types_to_remove and set types_to_keep to ALL_TYPES.

To filter out characters NOT of a select type, specify ALL_TYPES for types_to_remove and which types to not remove in types_to_keep.

Example:
s = ['ab', 'a b', 'a7bb', 'A7B234']
s1 = s.filter_characters_of_type(s,NUMERIC,"",ALL_TYPES)
s1 is ['ab', 'a b', 'abb', 'AB']
s2 = s.filter_characters_of_type(s,ALL_TYPES,"-",LOWER)
s2 is ['ab', 'a-b', 'a-bb', '------']

In s1 all NUMERIC types have been removed. In s2 all non-LOWER types have been replaced.

One but not both parameters types_to_remove and types_to_keep must be set to ALL_TYPES.

Any null row results in a null entry for that row in the output column.

Exceptions
cudf::logic_errorif neither or both types_to_remove and types_to_keep are set to ALL_TYPES.
Parameters
stringsStrings instance for this operation.
types_to_removeThe character types to check in each string. Use ALL_TYPES here to specify types_to_keep instead.
replacementThe replacement character to use when removing characters.
types_to_keepDefault ALL_TYPES means all characters of types_to_remove will be filtered.
mrDevice memory resource used to allocate the returned column's device memory.
Returns
New column of boolean results for each string.