Attention

The vector search and clustering algorithms in RAFT are being migrated to a new library dedicated to vector search called cuVS. We will continue to support the vector search algorithms in RAFT during this move, but will no longer update them after the RAPIDS 24.06 (June) release. We plan to complete the migration by RAPIDS 24.08 (August) release.

Distance#

This page provides pylibraft class references for the publicly-exposed elements of the pylibraft.distance package. RAFT’s distances have been highly optimized and support a wide assortment of different distance measures.

pylibraft.distance.pairwise_distance(X, Y, out=None, metric='euclidean', p=2.0, handle=None)[source]#

Compute pairwise distances between X and Y

Valid values for metric:
[“euclidean”, “l2”, “l1”, “cityblock”, “inner_product”,

“chebyshev”, “canberra”, “lp”, “hellinger”, “jensenshannon”, “kl_divergence”, “russellrao”, “minkowski”, “correlation”, “cosine”]

Parameters:
XCUDA array interface compliant matrix shape (m, k)
YCUDA array interface compliant matrix shape (n, k)
outOptional writable CUDA array interface matrix shape (m, n)
metricstring denoting the metric type (default=”euclidean”)
pmetric parameter (currently used only for “minkowski”)
handleOptional RAFT resource handle for reusing CUDA resources.

If a handle isn’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If a handle is supplied, you will need to explicitly synchronize yourself by calling handle.sync() before accessing the output.

Returns:
raft.device_ndarray containing pairwise distances

Examples

To compute pairwise distances on cupy arrays:

>>> import cupy as cp
>>> from pylibraft.common import Handle
>>> from pylibraft.distance import pairwise_distance
>>> n_samples = 5000
>>> n_features = 50
>>> in1 = cp.random.random_sample((n_samples, n_features),
...                               dtype=cp.float32)
>>> in2 = cp.random.random_sample((n_samples, n_features),
...                               dtype=cp.float32)

A single RAFT handle can optionally be reused across pylibraft functions.

>>> handle = Handle()
>>> output = pairwise_distance(in1, in2, metric="euclidean", handle=handle)

pylibraft functions are often asynchronous so the handle needs to be explicitly synchronized

>>> handle.sync()

It’s also possible to write to a pre-allocated output array:

>>> import cupy as cp
>>> from pylibraft.common import Handle
>>> from pylibraft.distance import pairwise_distance
>>> n_samples = 5000
>>> n_features = 50
>>> in1 = cp.random.random_sample((n_samples, n_features),
...                              dtype=cp.float32)
>>> in2 = cp.random.random_sample((n_samples, n_features),
...                              dtype=cp.float32)
>>> output = cp.empty((n_samples, n_samples), dtype=cp.float32)

A single RAFT handle can optionally be reused across pylibraft functions.

>>>
>>> handle = Handle()
>>> pairwise_distance(in1, in2, out=output,
...                  metric="euclidean", handle=handle)
array(...)

pylibraft functions are often asynchronous so the handle needs to be explicitly synchronized

>>> handle.sync()
pylibraft.distance.fused_l2_nn_argmin(X, Y, out=None, sqrt=True, handle=None)[source]#

Compute the 1-nearest neighbors between X and Y using the L2 distance

Parameters:
XCUDA array interface compliant matrix shape (m, k)
YCUDA array interface compliant matrix shape (n, k)
outputWritable CUDA array interface matrix shape (m, 1)
handleOptional RAFT resource handle for reusing CUDA resources.

If a handle isn’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If a handle is supplied, you will need to explicitly synchronize yourself by calling handle.sync() before accessing the output.

Examples

To compute the 1-nearest neighbors argmin:

>>> import cupy as cp
>>> from pylibraft.common import Handle
>>> from pylibraft.distance import fused_l2_nn_argmin
>>> n_samples = 5000
>>> n_clusters = 5
>>> n_features = 50
>>> in1 = cp.random.random_sample((n_samples, n_features),
...                               dtype=cp.float32)
>>> in2 = cp.random.random_sample((n_clusters, n_features),
...                               dtype=cp.float32)
>>> # A single RAFT handle can optionally be reused across
>>> # pylibraft functions.
>>> handle = Handle()
>>> output = fused_l2_nn_argmin(in1, in2, handle=handle)
>>> # pylibraft functions are often asynchronous so the
>>> # handle needs to be explicitly synchronized
>>> handle.sync()

The output can also be computed in-place on a preallocated array:

>>> import cupy as cp
>>> from pylibraft.common import Handle
>>> from pylibraft.distance import fused_l2_nn_argmin
>>> n_samples = 5000
>>> n_clusters = 5
>>> n_features = 50
>>> in1 = cp.random.random_sample((n_samples, n_features),
...                               dtype=cp.float32)
>>> in2 = cp.random.random_sample((n_clusters, n_features),
...                               dtype=cp.float32)
>>> output = cp.empty((n_samples, 1), dtype=cp.int32)
>>> # A single RAFT handle can optionally be reused across
>>> # pylibraft functions.
>>> handle = Handle()
>>> fused_l2_nn_argmin(in1, in2, out=output, handle=handle)
array(...)
>>> # pylibraft functions are often asynchronous so the
>>> # handle needs to be explicitly synchronized
>>> handle.sync()