All-neighbors KNN#

All-neighbors allows building an approximate all-neighbors knn graph. Given a full dataset, it finds nearest neighbors for all the training vectors in the dataset.

Parameters#

class cuvs.neighbors.all_neighbors.AllNeighborsParams(
algo='nn_descent',
*,
overlap_factor=2,
n_clusters=1,
metric='sqeuclidean',
ivf_pq_params=None,
nn_descent_params=None,
)#

Parameters for all-neighbors k-NN graph building.

Parameters:
algostr or cuvsAllNeighborsAlgo

Algorithm to use for local k-NN graph building. Options: “brute_force”, “ivf_pq”, “nn_descent”

overlap_factorint, default=2

Number of clusters each point is assigned to (must be < n_clusters)

n_clustersint, default=1

Number of clusters/batches to partition the dataset into (> overlap_factor). Use n_clusters>1 to distribute the work across GPUs.

metricstr or cuvsDistanceType, default=”sqeuclidean”

Distance metric to use for graph construction

ivf_pq_paramscuvs.neighbors.ivf_pq.IndexParams, optional

IVF-PQ specific parameters (used when algo=”ivf_pq”)

nn_descent_paramscuvs.neighbors.nn_descent.IndexParams, optional

NN-Descent specific parameters (used when algo=”nn_descent”)

Attributes:
algo

Algorithm used for local k-NN graph building.

metric

Distance metric used for graph construction.

n_clusters

Number of clusters/batches to partition the dataset into.

overlap_factor

Number of clusters each point is assigned to.

Methods

get_handle(self)

Get a pointer to the underlying C object.

algo#

Algorithm used for local k-NN graph building.

get_handle(self)[source]#

Get a pointer to the underlying C object.

metric#

Distance metric used for graph construction.

n_clusters#

Number of clusters/batches to partition the dataset into.

overlap_factor#

Number of clusters each point is assigned to.

Build#

cuvs.neighbors.all_neighbors.build(
dataset,
k,
params,
*,
indices=None,
distances=None,
core_distances=None,
alpha=1.0,
resources=None,
)[source]#

All-neighbors allows building an approximate all-neighbors knn graph. Given a full dataset, it finds nearest neighbors for all the training vectors in the dataset.

Parameters:
datasetarray_like

Training dataset to build the k-NN graph for. Can be provided on host (for multi-GPU build) or device (for single-GPU build). Host vs device location is automatically detected. Supported dtype: float32

kint

Number of nearest neighbors to find for each point

paramsAllNeighborsParams

Parameters object containing all build settings including algorithm choice and algorithm-specific parameters.

indicesarray_like, optional

Optional output buffer for indices [num_rows x k] on device (int64). If not provided, will be allocated automatically.

distancesarray_like, optional

Optional output buffer for distances [num_rows x k] on device (float32)

core_distancesarray_like, optional

Optional output buffer for core distances [num_rows] on device (float32). Requires distances parameter to be provided.

alphafloat, default=1.0

Mutual-reachability scaling; used only when core_distances is provided

resourcesResources or MultiGpuResources, optional

CUDA resources to use for the operation. If not provided, a default Resources object will be created. Use MultiGpuResources to enable multi-GPU execution across multiple devices.

Returns:
indicesarray_like

k-NN indices for each point [num_rows x k], always on device. If indices buffer was provided, returns the same array filled with results.

distancesarray_like or None

k-NN distances if distances buffer was provided, None otherwise

core_distancesarray_like or None

Core distances if core_distances buffer was provided, None otherwise