cudf.core.column.string.StringMethods.edit_distance_matrix#

StringMethods.edit_distance_matrix() SeriesOrIndex#

Computes the edit distance between strings in the series.

The series to compute the matrix should have more than 2 strings and should not contain nulls.

Edit distance is measured based on the Levenshtein edit distance algorithm.

Returns:
Series of ListDtype(int64)

Assume N is the length of this series. The return series contains N lists of size N, where the j th number in the i th row of the series tells the edit distance between the i th string and the j th string of this series. The matrix is symmetric. Diagonal elements are 0.

Examples

>>> import cudf
>>> s = cudf.Series(['abc', 'bc', 'cba'])
>>> s.str.edit_distance_matrix()
0    [0, 1, 2]
1    [1, 0, 2]
2    [2, 2, 0]
dtype: list