Files | Functions
Replacing

Files

file  cudf/strings/replace.hpp
 
file  replace_re.hpp
 

Functions

std::unique_ptr< columncudf::strings::replace (strings_column_view const &input, string_scalar const &target, string_scalar const &repl, cudf::size_type maxrepl=-1, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Replaces target string within each string with the specified replacement string. More...
 
std::unique_ptr< columncudf::strings::replace_slice (strings_column_view const &input, string_scalar const &repl=string_scalar(""), size_type start=0, size_type stop=-1, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 This function replaces each string in the column with the provided repl string within the [start,stop) character position range. More...
 
std::unique_ptr< columncudf::strings::replace_multiple (strings_column_view const &input, strings_column_view const &targets, strings_column_view const &repls, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Replaces substrings matching a list of targets with the corresponding replacement strings. More...
 
std::unique_ptr< columncudf::strings::replace_re (strings_column_view const &input, regex_program const &prog, string_scalar const &replacement=string_scalar(""), std::optional< size_type > max_replace_count=std::nullopt, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 For each string, replaces any character sequence matching the given regex with the provided replacement string. More...
 
std::unique_ptr< columncudf::strings::replace_re (strings_column_view const &input, std::vector< std::string > const &patterns, strings_column_view const &replacements, regex_flags const flags=regex_flags::DEFAULT, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 For each string, replaces any character sequence matching the given patterns with the corresponding string in the replacements column. More...
 
std::unique_ptr< columncudf::strings::replace_with_backrefs (strings_column_view const &input, regex_program const &prog, std::string_view replacement, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 For each string, replaces any character sequence matching the given regex using the replacement template for back-references. More...
 

Detailed Description

Function Documentation

◆ replace()

std::unique_ptr<column> cudf::strings::replace ( strings_column_view const &  input,
string_scalar const &  target,
string_scalar const &  repl,
cudf::size_type  maxrepl = -1,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Replaces target string within each string with the specified replacement string.

This function searches each string in the column for the target string. If found, the target string is replaced by the repl string within the input string. If not found, the output entry is just a copy of the corresponding input string.

Specifying an empty string for repl will essentially remove the target string if found in each string.

Null string entries will return null output string entries.

Example:
s = ["hello", "goodbye"]
r1 = replace(s,"o","OOO")
r1 is now ["hellOOO","gOOOOOOdbye"]
r2 = replace(s,"oo","")
r2 is now ["hello","gdbye"]
Exceptions
cudf::logic_errorif target is an empty string.
Parameters
inputStrings column for this operation
targetString to search for within each string
replReplacement string if target is found
maxreplMaximum times to replace if target appears multiple times in the input string. Default of -1 specifies replace all occurrences of target in each string.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New strings column

◆ replace_multiple()

std::unique_ptr<column> cudf::strings::replace_multiple ( strings_column_view const &  input,
strings_column_view const &  targets,
strings_column_view const &  repls,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Replaces substrings matching a list of targets with the corresponding replacement strings.

For each string in strings, the list of targets is searched within that string. If a target string is found, it is replaced by the corresponding entry in the repls column. All occurrences found in each string are replaced.

This does not use regex to match targets in the string. Empty string targets are ignored.

Null string entries will return null output string entries.

The repls argument can optionally contain a single string. In this case, all matching target substrings will be replaced by that single string.

Example:
s = ["hello", "goodbye"]
tgts = ["e","o"]
repls = ["EE","OO"]
r1 = replace(s,tgts,repls)
r1 is now ["hEEllO", "gOOOOdbyEE"]
tgts = ["e","oo"]
repls = ["33",""]
r2 = replace(s,tgts,repls)
r2 is now ["h33llo", "gdby33"]
Exceptions
cudf::logic_errorif targets and repls are different sizes except if repls is a single string.
cudf::logic_errorif targets or repls contain null entries.
Parameters
inputStrings column for this operation
targetsStrings to search for in each string
replsCorresponding replacement strings for target strings
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New strings column

◆ replace_re() [1/2]

std::unique_ptr<column> cudf::strings::replace_re ( strings_column_view const &  input,
regex_program const &  prog,
string_scalar const &  replacement = string_scalar(""),
std::optional< size_type max_replace_count = std::nullopt,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

For each string, replaces any character sequence matching the given regex with the provided replacement string.

Any null string entries return corresponding null output column entries.

See the Regex Features page for details on patterns supported by this API.

Parameters
inputStrings instance for this operation
progRegex program instance
replacementThe string used to replace the matched sequence in each string. Default is an empty string.
max_replace_countThe maximum number of times to replace the matched pattern within each string. Default replaces every substring that is matched.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New strings column

◆ replace_re() [2/2]

std::unique_ptr<column> cudf::strings::replace_re ( strings_column_view const &  input,
std::vector< std::string > const &  patterns,
strings_column_view const &  replacements,
regex_flags const  flags = regex_flags::DEFAULT,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

For each string, replaces any character sequence matching the given patterns with the corresponding string in the replacements column.

Any null string entries return corresponding null output column entries.

See the Regex Features page for details on patterns supported by this API.

Parameters
inputStrings instance for this operation
patternsThe regular expression patterns to search within each string
replacementsThe strings used for replacement
flagsRegex flags for interpreting special characters in the patterns
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New strings column

◆ replace_slice()

std::unique_ptr<column> cudf::strings::replace_slice ( strings_column_view const &  input,
string_scalar const &  repl = string_scalar(""),
size_type  start = 0,
size_type  stop = -1,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

This function replaces each string in the column with the provided repl string within the [start,stop) character position range.

Null string entries will return null output string entries.

Position values are 0-based meaning position 0 is the first character of each string.

This function can be used to insert a string into specific position by specifying the same position value for start and stop. The repl string can be appended to each string by specifying -1 for both start and stop.

Example:
s = ["abcdefghij","0123456789"]
r = s.replace_slice(s,2,5,"z")
r is now ["abzfghij", "01z56789"]
Exceptions
cudf::logic_errorif start is greater than stop.
Parameters
inputStrings column for this operation.
replReplacement string for specified positions found. Default is empty string.
startStart position where repl will be added. Default is 0, first character position.
stopEnd position (exclusive) to use for replacement. Default of -1 specifies the end of each string.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New strings column

◆ replace_with_backrefs()

std::unique_ptr<column> cudf::strings::replace_with_backrefs ( strings_column_view const &  input,
regex_program const &  prog,
std::string_view  replacement,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

For each string, replaces any character sequence matching the given regex using the replacement template for back-references.

Any null string entries return corresponding null output column entries.

See the Regex Features page for details on patterns supported by this API.

Exceptions
cudf::logic_errorif capture index values in replacement are not in range 0-99, and also if the index exceeds the group count specified in the pattern
Parameters
inputStrings instance for this operation
progRegex program instance
replacementThe replacement template for creating the output string
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New strings column