sparse_pairwise_distances#
- cuml.metrics.sparse_pairwise_distances(X, Y=None, metric='euclidean', convert_dtype=True, metric_arg=2, **kwds)[source]#
Compute the distance matrix from a vector array
Xand optionalY.This method takes either one or two sparse vector arrays, and returns a dense distance matrix.
If
Yis given (default isNone), then the returned matrix is the pairwise distance between the arrays from bothXandY.Valid values for metric are:
From scikit-learn: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’].
- From scipy.spatial.distance: [‘sqeuclidean’, ‘canberra’, ‘minkowski’, ‘jaccard’, ‘chebyshev’, ‘dice’]
See the documentation for scipy.spatial.distance for details on these metrics.
[‘inner_product’, ‘hellinger’]
- Parameters:
- Xarray-like (device or host) of shape (n_samples_x, n_features)
Acceptable formats: SciPy or Cupy sparse array
- Yarray-like (device or host) of shape (n_samples_y, n_features), optional
Acceptable formats: SciPy or Cupy sparse array
- metric{“cityblock”, “cosine”, “euclidean”, “l1”, “l2”, “manhattan”, “sqeuclidean”, “canberra”, “lp”, “inner_product”, “minkowski”, “jaccard”, “hellinger”, “chebyshev”, “linf”, “dice”}
The metric to use when calculating distance between instances in a feature array.
- convert_dtypebool, optional (default = True)
When set to True, the method will, when necessary, convert Y to be the same data type as X if they differ. This will increase memory used for the method.
- metric_argfloat, optional (default = 2)
Additional metric-specific argument. For Minkowski it’s the p-norm to apply.
- Returns:
- Darray [n_samples_x, n_samples_x] or [n_samples_x, n_samples_y]
A dense distance matrix D such that D_{i, j} is the distance between the ith and jth vectors of the given matrix
X, ifYis None. IfYis notNone, then D_{i, j} is the distance between the ith array fromXand the jth array fromY.
Examples
>>> import cupy as cp >>> import cupyx >>> from cuml.metrics import sparse_pairwise_distances >>> X = cupyx.scipy.sparse.csr_matrix(cp.array([[1.0, 2.0, 0.0], ... [0.0, 3.0, 1.0]])) >>> Y = cupyx.scipy.sparse.csr_matrix(cp.array([[1.0, 0.0, 2.0]])) >>> # Cosine Pairwise Distance, Single Input: >>> sparse_pairwise_distances(X, metric='cosine') array([[0. , 0.151...], [0.151..., 0. ]]) >>> # Squared euclidean Pairwise Distance, Multi-Input: >>> sparse_pairwise_distances(X, Y, metric='sqeuclidean') array([[ 8.], [11.]]) >>> # Canberra Pairwise Distance, Multi-Input: >>> sparse_pairwise_distances(X, Y, metric='canberra') array([[2. ], [2.333...]])