Files | Functions
Edit Distance

Files

file  edit_distance.hpp
 

Functions

std::unique_ptr< cudf::columnnvtext::edit_distance (cudf::strings_column_view const &input, cudf::strings_column_view const &targets, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Compute the edit distance between individual strings in two strings columns. More...
 

Detailed Description

Function Documentation

◆ edit_distance()

Compute the edit distance between individual strings in two strings columns.

The output[i] is the edit distance between input[i] and targets[i]. This edit distance calculation uses the Levenshtein algorithm as documented here: https://www.cuelogic.com/blog/the-levenshtein-algorithm

Example:
s = ["hello", "", "world"]
t = ["hallo", "goodbye", "world"]
d = edit_distance(s, t)
d is now [1, 7, 0]

Any null entries for either input or targets is ignored and the edit distance is computed as though the null entry is an empty string.

The targets.size() must equal input.size() unless targets.size()==1. In this case, all input will be computed against the single targets[0] string.

Exceptions
std::invalid_argumentif targets.size() != input.size() and if targets.size() != 1
std::invalid_argumentif targets.size() == 1 and targets[0].is_null()
Parameters
inputStrings column of input strings
targetsStrings to compute edit distance against input
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New lists column of edit distance values