Strings Types#
- group strings_types
Enums
-
enum string_character_types#
Character type values. These types can be or’d to check for any combination of types.
This cannot be turned into an enum class because or’d entries can result in values that are not in the class. For example, combining NUMERIC|SPACE is a valid, reasonable combination but does not match to any explicitly named enumerator.
Values:
-
enumerator DECIMAL#
all decimal characters
-
enumerator NUMERIC#
all numeric characters
-
enumerator DIGIT#
all digit characters
-
enumerator ALPHA#
all alphabetic characters
-
enumerator SPACE#
all space characters
-
enumerator UPPER#
all upper case characters
-
enumerator LOWER#
all lower case characters
-
enumerator ALPHANUM#
all alphanumeric characters
-
enumerator CASE_TYPES#
all case-able characters
-
enumerator ALL_TYPES#
all character types
-
enumerator DECIMAL#
Functions
-
std::unique_ptr<column> all_characters_of_type(strings_column_view const &input, string_character_types types, string_character_types verify_types = string_character_types::ALL_TYPES, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Returns a boolean column identifying strings entries in which all characters are of the type specified.
The output row entry will be set to false if the corresponding string element is empty or has at least one character not of the specified type. If all characters fit the type then true is set in that output row entry.
To ignore all but specific types, set the
verify_types
to those types which should be checked. Otherwise, the defaultALL_TYPES
will verify all characters matchtypes
.Example: s = ['ab', 'a b', 'a7', 'a B'] b1 = s.all_characters_of_type(s,LOWER) b1 is [true, false, false, false] b2 = s.all_characters_of_type(s,LOWER,LOWER|UPPER) b2 is [true, true, true, false]
Any null row results in a null entry for that row in the output column.
- Parameters:
input – Strings instance for this operation
types – The character types to check in each string
verify_types – Only verify against these character types. Default
ALL_TYPES
means returntrue
iff all characters matchtypes
.stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
New column of boolean results for each string
-
std::unique_ptr<column> filter_characters_of_type(strings_column_view const &input, string_character_types types_to_remove, string_scalar const &replacement = string_scalar(""), string_character_types types_to_keep = string_character_types::ALL_TYPES, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Filter specific character types from a column of strings.
To remove all characters of a specific type, set that type in
types_to_remove
and settypes_to_keep
toALL_TYPES
.To filter out characters NOT of a select type, specify
ALL_TYPES
fortypes_to_remove
and which types to not remove intypes_to_keep
.Example: s = ['ab', 'a b', 'a7bb', 'A7B234'] s1 = s.filter_characters_of_type(s,NUMERIC,"",ALL_TYPES) s1 is ['ab', 'a b', 'abb', 'AB'] s2 = s.filter_characters_of_type(s,ALL_TYPES,"-",LOWER) s2 is ['ab', 'a-b', 'a-bb', '------']
In
s1
all NUMERIC types have been removed. Ins2
all non-LOWER types have been replaced.One but not both parameters
types_to_remove
andtypes_to_keep
must be set toALL_TYPES
.Any null row results in a null entry for that row in the output column.
- Throws:
cudf::logic_error – if neither or both
types_to_remove
andtypes_to_keep
are set toALL_TYPES
.- Parameters:
input – Strings instance for this operation
types_to_remove – The character types to check in each string. Use
ALL_TYPES
here to specifytypes_to_keep
instead.replacement – The replacement character to use when removing characters
types_to_keep – Default
ALL_TYPES
means all characters oftypes_to_remove
will be filtered.mr – Device memory resource used to allocate the returned column’s device memory
stream – CUDA stream used for device memory operations and kernel launches
- Returns:
New column of boolean results for each string
-
constexpr string_character_types operator|(string_character_types lhs, string_character_types rhs)#
OR operator for combining string_character_types.
- Parameters:
lhs – left-hand side of OR operation
rhs – right-hand side of OR operation
- Returns:
combined string_character_types
-
constexpr string_character_types &operator|=(string_character_types &lhs, string_character_types rhs)#
Compound assignment OR operator for combining string_character_types.
- Parameters:
lhs – left-hand side of OR operation
rhs – right-hand side of OR operation
- Returns:
Reference to
lhs
after combininglhs
andrhs
-
enum string_character_types#