sparse_pairwise_distances#

cuml.metrics.sparse_pairwise_distances(X, Y=None, metric='euclidean', convert_dtype=True, metric_arg=2, **kwds)[source]#

Compute the distance matrix from a vector array X and optional Y.

This method takes either one or two sparse vector arrays, and returns a dense distance matrix.

If Y is given (default is None), then the returned matrix is the pairwise distance between the arrays from both X and Y.

Valid values for metric are:

  • From scikit-learn: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’].

  • From scipy.spatial.distance: [‘sqeuclidean’, ‘canberra’, ‘minkowski’, ‘jaccard’, ‘chebyshev’, ‘dice’]

    See the documentation for scipy.spatial.distance for details on these metrics.

  • [‘inner_product’, ‘hellinger’]

Parameters:
Xarray-like (device or host) of shape (n_samples_x, n_features)

Acceptable formats: SciPy or Cupy sparse array

Yarray-like (device or host) of shape (n_samples_y, n_features), optional

Acceptable formats: SciPy or Cupy sparse array

metric{“cityblock”, “cosine”, “euclidean”, “l1”, “l2”, “manhattan”, “sqeuclidean”, “canberra”, “lp”, “inner_product”, “minkowski”, “jaccard”, “hellinger”, “chebyshev”, “linf”, “dice”}

The metric to use when calculating distance between instances in a feature array.

convert_dtypebool, optional (default = True)

When set to True, the method will, when necessary, convert Y to be the same data type as X if they differ. This will increase memory used for the method.

metric_argfloat, optional (default = 2)

Additional metric-specific argument. For Minkowski it’s the p-norm to apply.

Returns:
Darray [n_samples_x, n_samples_x] or [n_samples_x, n_samples_y]

A dense distance matrix D such that D_{i, j} is the distance between the ith and jth vectors of the given matrix X, if Y is None. If Y is not None, then D_{i, j} is the distance between the ith array from X and the jth array from Y.

Examples

>>> import cupy as cp
>>> import cupyx
>>> from cuml.metrics import sparse_pairwise_distances

>>> X = cupyx.scipy.sparse.csr_matrix(cp.array([[1.0, 2.0, 0.0],
...                                             [0.0, 3.0, 1.0]]))
>>> Y = cupyx.scipy.sparse.csr_matrix(cp.array([[1.0, 0.0, 2.0]]))
>>> # Cosine Pairwise Distance, Single Input:
>>> sparse_pairwise_distances(X, metric='cosine')
array([[0.   , 0.151...],
    [0.151..., 0.   ]])

>>> # Squared euclidean Pairwise Distance, Multi-Input:
>>> sparse_pairwise_distances(X, Y, metric='sqeuclidean')
array([[ 8.],
    [11.]])

>>> # Canberra Pairwise Distance, Multi-Input:
>>> sparse_pairwise_distances(X, Y, metric='canberra')
array([[2.   ],
    [2.333...]])