Attention
The vector search and clustering algorithms in RAFT are being migrated to a new library dedicated to vector search called cuVS. We will continue to support the vector search algorithms in RAFT during this move, but will no longer update them after the RAPIDS 24.06 (June) release. We plan to complete the migration by RAPIDS 24.10 (October) release and they will be removed from RAFT altogether in the 24.12 (December) release.
Distance#
This page provides pylibraft
class references for the publicly-exposed elements of the pylibraft.distance
package. RAFT’s
distances have been highly optimized and support a wide assortment of different distance measures.
- pylibraft.distance.pairwise_distance(X, Y, out=None, metric='euclidean', p=2.0, handle=None)[source]#
Compute pairwise distances between X and Y
- Valid values for metric:
- [“euclidean”, “l2”, “l1”, “cityblock”, “inner_product”,
“chebyshev”, “canberra”, “lp”, “hellinger”, “jensenshannon”, “kl_divergence”, “russellrao”, “minkowski”, “correlation”, “cosine”]
- Parameters:
- XCUDA array interface compliant matrix shape (m, k)
- YCUDA array interface compliant matrix shape (n, k)
- outOptional writable CUDA array interface matrix shape (m, n)
- metricstring denoting the metric type (default=”euclidean”)
- pmetric parameter (currently used only for “minkowski”)
- handleOptional RAFT resource handle for reusing CUDA resources.
If a handle isn’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If a handle is supplied, you will need to explicitly synchronize yourself by calling
handle.sync()
before accessing the output.
- Returns:
- raft.device_ndarray containing pairwise distances
Examples
To compute pairwise distances on cupy arrays:
>>> import cupy as cp >>> from pylibraft.common import Handle >>> from pylibraft.distance import pairwise_distance >>> n_samples = 5000 >>> n_features = 50 >>> in1 = cp.random.random_sample((n_samples, n_features), ... dtype=cp.float32) >>> in2 = cp.random.random_sample((n_samples, n_features), ... dtype=cp.float32)
A single RAFT handle can optionally be reused across pylibraft functions.
>>> handle = Handle() >>> output = pairwise_distance(in1, in2, metric="euclidean", handle=handle)
pylibraft functions are often asynchronous so the handle needs to be explicitly synchronized
>>> handle.sync()
It’s also possible to write to a pre-allocated output array:
>>> import cupy as cp >>> from pylibraft.common import Handle >>> from pylibraft.distance import pairwise_distance >>> n_samples = 5000 >>> n_features = 50 >>> in1 = cp.random.random_sample((n_samples, n_features), ... dtype=cp.float32) >>> in2 = cp.random.random_sample((n_samples, n_features), ... dtype=cp.float32) >>> output = cp.empty((n_samples, n_samples), dtype=cp.float32)
A single RAFT handle can optionally be reused across pylibraft functions.
>>> >>> handle = Handle() >>> pairwise_distance(in1, in2, out=output, ... metric="euclidean", handle=handle) array(...)
pylibraft functions are often asynchronous so the handle needs to be explicitly synchronized
>>> handle.sync()
- pylibraft.distance.fused_l2_nn_argmin(X, Y, out=None, sqrt=True, handle=None)[source]#
Compute the 1-nearest neighbors between X and Y using the L2 distance
- Parameters:
- XCUDA array interface compliant matrix shape (m, k)
- YCUDA array interface compliant matrix shape (n, k)
- outputWritable CUDA array interface matrix shape (m, 1)
- handleOptional RAFT resource handle for reusing CUDA resources.
If a handle isn’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If a handle is supplied, you will need to explicitly synchronize yourself by calling
handle.sync()
before accessing the output.
Examples
To compute the 1-nearest neighbors argmin:
>>> import cupy as cp >>> from pylibraft.common import Handle >>> from pylibraft.distance import fused_l2_nn_argmin >>> n_samples = 5000 >>> n_clusters = 5 >>> n_features = 50 >>> in1 = cp.random.random_sample((n_samples, n_features), ... dtype=cp.float32) >>> in2 = cp.random.random_sample((n_clusters, n_features), ... dtype=cp.float32) >>> # A single RAFT handle can optionally be reused across >>> # pylibraft functions. >>> handle = Handle()
>>> output = fused_l2_nn_argmin(in1, in2, handle=handle)
>>> # pylibraft functions are often asynchronous so the >>> # handle needs to be explicitly synchronized >>> handle.sync()
The output can also be computed in-place on a preallocated array:
>>> import cupy as cp >>> from pylibraft.common import Handle >>> from pylibraft.distance import fused_l2_nn_argmin >>> n_samples = 5000 >>> n_clusters = 5 >>> n_features = 50 >>> in1 = cp.random.random_sample((n_samples, n_features), ... dtype=cp.float32) >>> in2 = cp.random.random_sample((n_clusters, n_features), ... dtype=cp.float32) >>> output = cp.empty((n_samples, 1), dtype=cp.int32) >>> # A single RAFT handle can optionally be reused across >>> # pylibraft functions. >>> handle = Handle()
>>> fused_l2_nn_argmin(in1, in2, out=output, handle=handle) array(...)
>>> # pylibraft functions are often asynchronous so the >>> # handle needs to be explicitly synchronized >>> handle.sync()