nan_euclidean_distances#

cuml.metrics.nan_euclidean_distances(X, Y=None, *, squared=False, missing_values=cp.nan, convert_dtype=True)[source]#

Calculate the euclidean distances in the presence of missing values.

Compute the euclidean distance between each pair of samples in X and Y, where Y=X is assumed if Y=None. When calculating the distance between a pair of samples, this formulation ignores feature coordinates with a missing value in either sample and scales up the weight of the remaining coordinates:

dist(x,y) = sqrt(weight * sq. distance from present coordinates) where, weight = Total # of coordinates / # of present coordinates

For example, the distance between [3, na, na, 6] and [1, na, 4, 5] is:

\[\sqrt{\frac{4}{2}((3-1)^2 + (6-5)^2)}\]

If all the coordinates are missing or if there are no common present coordinates then NaN is returned for that pair.

Parameters:
Xarray-like (device or host) of shape (n_samples_X, n_features)

Acceptable formats: cuDF DataFrame, NumPy ndarray, Numba device ndarray, cuda array interface compliant array like CuPy.

Yarray-like (device or host) of shape (n_samples_Y, n_features), default=None

A second feature array. If None, Y is assumed to be X. Acceptable formats: cuDF DataFrame, NumPy ndarray, Numba device ndarray, cuda array interface compliant array like CuPy.

squaredbool, default=False

Return squared Euclidean distances.

missing_valuesnp.nan or int, default=np.nan

Representation of missing value.

convert_dtypebool, optional (default = True)

When set to True, the method will, when necessary, convert X to a supported floating-point dtype and convert Y to match X’s dtype. This will increase memory used for the method.

Returns:
distancesarray of shape (n_samples_X, n_samples_Y)

Returns the distances between the row vectors of X and the row vectors of Y.