nan_euclidean_distances#
- cuml.metrics.nan_euclidean_distances(X, Y=None, *, squared=False, missing_values=cp.nan, convert_dtype=True)[source]#
Calculate the euclidean distances in the presence of missing values.
Compute the euclidean distance between each pair of samples in X and Y, where Y=X is assumed if Y=None. When calculating the distance between a pair of samples, this formulation ignores feature coordinates with a missing value in either sample and scales up the weight of the remaining coordinates:
dist(x,y) = sqrt(weight * sq. distance from present coordinates) where, weight = Total # of coordinates / # of present coordinates
For example, the distance between
[3, na, na, 6]and[1, na, 4, 5]is:\[\sqrt{\frac{4}{2}((3-1)^2 + (6-5)^2)}\]If all the coordinates are missing or if there are no common present coordinates then NaN is returned for that pair.
- Parameters:
- XDense matrix of shape (n_samples_X, n_features)
Acceptable formats: cuDF DataFrame, Pandas DataFrame, NumPy ndarray, cuda array interface compliant array like CuPy.
- YDense matrix of shape (n_samples_Y, n_features), default=None
Acceptable formats: cuDF DataFrame, Pandas DataFrame, NumPy ndarray, cuda array interface compliant array like CuPy.
- squaredbool, default=False
Return squared Euclidean distances.
- missing_valuesnp.nan or int, default=np.nan
Representation of missing value.
- Returns:
- distancesndarray of shape (n_samples_X, n_samples_Y)
Returns the distances between the row vectors of
Xand the row vectors ofY.