Files | Functions
Replacing

Files

file  cudf/strings/replace.hpp
 
file  replace_re.hpp
 

Functions

std::unique_ptr< columncudf::strings::replace (strings_column_view const &strings, string_scalar const &target, string_scalar const &repl, int32_t maxrepl=-1, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Replaces target string within each string with the specified replacement string. More...
 
std::unique_ptr< columncudf::strings::replace_slice (strings_column_view const &strings, string_scalar const &repl=string_scalar(""), size_type start=0, size_type stop=-1, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 This function replaces each string in the column with the provided repl string within the [start,stop) character position range. More...
 
std::unique_ptr< columncudf::strings::replace (strings_column_view const &strings, strings_column_view const &targets, strings_column_view const &repls, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Replaces substrings matching a list of targets with the corresponding replacement strings. More...
 
std::unique_ptr< columncudf::strings::replace_re (strings_column_view const &strings, regex_program const &prog, string_scalar const &replacement=string_scalar(""), std::optional< size_type > max_replace_count=std::nullopt, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 For each string, replaces any character sequence matching the given regex with the provided replacement string. More...
 
std::unique_ptr< columncudf::strings::replace_re (strings_column_view const &strings, std::vector< std::string > const &patterns, strings_column_view const &replacements, regex_flags const flags=regex_flags::DEFAULT, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 For each string, replaces any character sequence matching the given patterns with the corresponding string in the replacements column. More...
 
std::unique_ptr< columncudf::strings::replace_with_backrefs (strings_column_view const &strings, regex_program const &prog, std::string_view replacement, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 For each string, replaces any character sequence matching the given regex using the replacement template for back-references. More...
 

Detailed Description

Function Documentation

◆ replace() [1/2]

std::unique_ptr<column> cudf::strings::replace ( strings_column_view const &  strings,
string_scalar const &  target,
string_scalar const &  repl,
int32_t  maxrepl = -1,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Replaces target string within each string with the specified replacement string.

This function searches each string in the column for the target string. If found, the target string is replaced by the repl string within the input string. If not found, the output entry is just a copy of the corresponding input string.

Specifying an empty string for repl will essentially remove the target string if found in each string.

Null string entries will return null output string entries.

Example:
s = ["hello", "goodbye"]
r1 = replace(s,"o","OOO")
r1 is now ["hellOOO","gOOOOOOdbye"]
r2 = replace(s,"oo","")
r2 is now ["hello","gdbye"]
Exceptions
cudf::logic_errorif target is an empty string.
Parameters
stringsStrings column for this operation.
targetString to search for within each string.
replReplacement string if target is found.
maxreplMaximum times to replace if target appears multiple times in the input string. Default of -1 specifies replace all occurrences of target in each string.
mrDevice memory resource used to allocate the returned column's device memory.
Returns
New strings column.

◆ replace() [2/2]

std::unique_ptr<column> cudf::strings::replace ( strings_column_view const &  strings,
strings_column_view const &  targets,
strings_column_view const &  repls,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Replaces substrings matching a list of targets with the corresponding replacement strings.

For each string in strings, the list of targets is searched within that string. If a target string is found, it is replaced by the corresponding entry in the repls column. All occurrences found in each string are replaced.

This does not use regex to match targets in the string.

Null string entries will return null output string entries.

The repls argument can optionally contain a single string. In this case, all matching target substrings will be replaced by that single string.

Example:
s = ["hello", "goodbye"]
tgts = ["e","o"]
repls = ["EE","OO"]
r1 = replace(s,tgts,repls)
r1 is now ["hEEllO", "gOOOOdbyEE"]
tgts = ["e","oo"]
repls = ["33",""]
r2 = replace(s,tgts,repls)
r2 is now ["h33llo", "gdby33"]
Exceptions
cudf::logic_errorif targets and repls are different sizes except if repls is a single string.
cudf::logic_errorif targets or repls contain null entries.
Parameters
stringsStrings column for this operation.
targetsStrings to search for in each string.
replsCorresponding replacement strings for target strings.
mrDevice memory resource used to allocate the returned column's device memory.
Returns
New strings column.

◆ replace_re() [1/2]

std::unique_ptr<column> cudf::strings::replace_re ( strings_column_view const &  strings,
regex_program const &  prog,
string_scalar const &  replacement = string_scalar(""),
std::optional< size_type max_replace_count = std::nullopt,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

For each string, replaces any character sequence matching the given regex with the provided replacement string.

Any null string entries return corresponding null output column entries.

See the Regex Features page for details on patterns supported by this API.

Parameters
stringsStrings instance for this operation
progRegex program instance
replacementThe string used to replace the matched sequence in each string. Default is an empty string.
max_replace_countThe maximum number of times to replace the matched pattern within each string. Default replaces every substring that is matched.
mrDevice memory resource used to allocate the returned column's device memory
Returns
New strings column

◆ replace_re() [2/2]

std::unique_ptr<column> cudf::strings::replace_re ( strings_column_view const &  strings,
std::vector< std::string > const &  patterns,
strings_column_view const &  replacements,
regex_flags const  flags = regex_flags::DEFAULT,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

For each string, replaces any character sequence matching the given patterns with the corresponding string in the replacements column.

Any null string entries return corresponding null output column entries.

See the Regex Features page for details on patterns supported by this API.

Parameters
stringsStrings instance for this operation.
patternsThe regular expression patterns to search within each string.
replacementsThe strings used for replacement.
flagsRegex flags for interpreting special characters in the patterns.
mrDevice memory resource used to allocate the returned column's device memory.
Returns
New strings column.

◆ replace_slice()

std::unique_ptr<column> cudf::strings::replace_slice ( strings_column_view const &  strings,
string_scalar const &  repl = string_scalar(""),
size_type  start = 0,
size_type  stop = -1,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

This function replaces each string in the column with the provided repl string within the [start,stop) character position range.

Null string entries will return null output string entries.

Position values are 0-based meaning position 0 is the first character of each string.

This function can be used to insert a string into specific position by specifying the same position value for start and stop. The repl string can be appended to each string by specifying -1 for both start and stop.

Example:
s = ["abcdefghij","0123456789"]
r = s.replace_slice(s,2,5,"z")
r is now ["abzfghij", "01z56789"]
Exceptions
cudf::logic_errorif start is greater than stop.
Parameters
stringsStrings column for this operation.
replReplacement string for specified positions found. Default is empty string.
startStart position where repl will be added. Default is 0, first character position.
stopEnd position (exclusive) to use for replacement. Default of -1 specifies the end of each string.
mrDevice memory resource used to allocate the returned column's device memory.
Returns
New strings column.

◆ replace_with_backrefs()

std::unique_ptr<column> cudf::strings::replace_with_backrefs ( strings_column_view const &  strings,
regex_program const &  prog,
std::string_view  replacement,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

For each string, replaces any character sequence matching the given regex using the replacement template for back-references.

Any null string entries return corresponding null output column entries.

See the Regex Features page for details on patterns supported by this API.

Exceptions
cudf::logic_errorif capture index values in replacement are not in range 0-99, and also if the index exceeds the group count specified in the pattern
Parameters
stringsStrings instance for this operation
progRegex program instance
replacementThe replacement template for creating the output string
mrDevice memory resource used to allocate the returned column's device memory
Returns
New strings column