Files | |
file | padding.hpp |
file | strings/reverse.hpp |
file | side_type.hpp |
file | strip.hpp |
file | translate.hpp |
file | wrap.hpp |
Enumerations | |
enum class | cudf::strings::side_type { cudf::strings::LEFT , cudf::strings::RIGHT , cudf::strings::BOTH } |
Direction identifier for cudf::strings::strip and cudf::strings::pad functions. More... | |
enum class | cudf::strings::filter_type : bool { cudf::strings::KEEP , cudf::strings::REMOVE } |
Removes or keeps the specified character ranges in cudf::strings::filter_characters. More... | |
|
strong |
Removes or keeps the specified character ranges in cudf::strings::filter_characters.
Enumerator | |
---|---|
KEEP | All characters but those specified are removed. |
REMOVE | Only the specified characters are removed. |
Definition at line 64 of file translate.hpp.
|
strong |
Direction identifier for cudf::strings::strip and cudf::strings::pad functions.
Enumerator | |
---|---|
LEFT | strip/pad characters from the beginning of the string |
RIGHT | strip/pad characters from the end of the string |
BOTH | strip/pad characters from the beginning and end of the string |
Definition at line 31 of file side_type.hpp.
std::unique_ptr<column> cudf::strings::filter_characters | ( | strings_column_view const & | input, |
std::vector< std::pair< cudf::char_utf8, cudf::char_utf8 >> | characters_to_filter, | ||
filter_type | keep_characters = filter_type::KEEP , |
||
string_scalar const & | replacement = string_scalar("") , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Removes ranges of characters from each string in a strings column.
This can also be used to keep only the specified character ranges and remove all others from each string.
Null string entries result in null entries in the output column.
cudf::logic_error | if replacement is invalid |
input | Strings instance for this operation |
characters_to_filter | Table of character ranges to filter on |
keep_characters | If true, the characters_to_filter are retained and all other characters are removed |
replacement | Optional replacement string for each character removed |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::strings::pad | ( | strings_column_view const & | input, |
size_type | width, | ||
side_type | side = side_type::RIGHT , |
||
std::string_view | fill_char = " " , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Add padding to each string using a provided character.
If the string is already width
or more characters, no padding is performed. Also, no strings are truncated.
Null string entries result in corresponding null entries in the output column.
input | Strings instance for this operation |
width | The minimum number of characters for each string |
side | Where to place the padding characters; Default is pad right (left justify) |
fill_char | Single UTF-8 character to use for padding; Default is the space character |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::strings::reverse | ( | strings_column_view const & | input, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Reverses the characters within each string.
Any null string entries return corresponding null output column entries.
input | Strings column for this operation |
mr | Device memory resource used to allocate the returned column's device memory |
stream | CUDA stream used for device memory operations and kernel launches |
std::unique_ptr<column> cudf::strings::strip | ( | strings_column_view const & | input, |
side_type | side = side_type::BOTH , |
||
string_scalar const & | to_strip = string_scalar("") , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Removes the specified characters from the beginning or end (or both) of each string.
The to_strip parameter can contain one or more characters. All characters in to_strip
are removed from the input strings.
If to_strip
is the empty string, whitespace characters are removed. Whitespace is considered the space character plus control characters like tab and line feed.
Any null string entries return corresponding null output column entries.
cudf::logic_error | if to_strip is invalid. |
input | Strings column for this operation |
side | Indicates characters are to be stripped from the beginning, end, or both of each string; Default is both |
to_strip | UTF-8 encoded characters to strip from each string; Default is empty string which indicates strip whitespace characters |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory. |
std::unique_ptr<column> cudf::strings::translate | ( | strings_column_view const & | input, |
std::vector< std::pair< char_utf8, char_utf8 >> const & | chars_table, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Translates individual characters within each string.
This can also be used to remove a character by specifying 0 for the corresponding table entry.
Null string entries result in null entries in the output column.
input | Strings instance for this operation |
chars_table | Table of UTF-8 character mappings |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::strings::wrap | ( | strings_column_view const & | input, |
size_type | width, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Wraps strings onto multiple lines shorter than width
by replacing appropriate white space with new-line characters (ASCII 0x0A).
For each string row in the input column longer than width
, the corresponding output string row will have newline characters inserted so that each line is no more than width characters
. Attempts to use existing white space locations to split the strings, but may split non-white-space sequences if necessary.
Any null string entries return corresponding null output column entries.
Example 1:
Example 2:
input | String column |
width | Maximum character width of a line within each string |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::strings::zfill | ( | strings_column_view const & | input, |
size_type | width, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Add '0' as padding to the left of each string.
This is equivalent to ‘pad(width,left,'0’)` but preserves the sign character if it appears in the first position.
If the string is already width or more characters, no padding is performed. No strings are truncated.
Null rows in the input result in corresponding null rows in the output column.
input | Strings instance for this operation |
width | The minimum number of characters for each string |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |