Nvtext Edit Distance#
- group Edit Distance
Functions
-
std::unique_ptr<cudf::column> edit_distance(cudf::strings_column_view const &input, cudf::strings_column_view const &targets, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Compute the edit distance between individual strings in two strings columns.
The
output[i]is the edit distance betweeninput[i]andtargets[i]. This edit distance calculation uses the Levenshtein algorithm as documented here: https://www.cuelogic.com/blog/the-levenshtein-algorithmExample: s = ["hello", "", "world"] t = ["hallo", "goodbye", "world"] d = edit_distance(s, t) d is now [1, 7, 0]
Any null entries for either
inputortargetsis ignored and the edit distance is computed as though the null entry is an empty string.The
targets.size()must equalinput.size()unlesstargets.size()==1. In this case, allinputwill be computed against the singletargets[0]string.- Throws:
std::invalid_argument – if
targets.size() != input.size()and iftargets.size() != 1std::invalid_argument – if
targets.size() == 1andtargets[0].is_null()
- Parameters:
input – Strings column of input strings
targets – Strings to compute edit distance against
inputstream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
New lists column of edit distance values
-
std::unique_ptr<cudf::column> edit_distance(cudf::strings_column_view const &input, cudf::strings_column_view const &targets, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#