All-neighbors KNN#
All-neighbors allows building an approximate all-neighbors knn graph. Given a full dataset, it finds nearest neighbors for all the training vectors in the dataset.
Parameters#
- class cuvs.neighbors.all_neighbors.AllNeighborsParams(
- algo='nn_descent',
- *,
- overlap_factor=2,
- n_clusters=1,
- metric='sqeuclidean',
- ivf_pq_params=None,
- nn_descent_params=None,
Parameters for all-neighbors k-NN graph building.
- Parameters:
- algostr or cuvsAllNeighborsAlgo
Algorithm to use for local k-NN graph building. Options: “brute_force”, “ivf_pq”, “nn_descent”
- overlap_factorint, default=2
Number of clusters each point is assigned to (must be < n_clusters)
- n_clustersint, default=1
Number of clusters/batches to partition the dataset into (> overlap_factor). Use n_clusters>1 to distribute the work across GPUs.
- metricstr or cuvsDistanceType, default=”sqeuclidean”
Distance metric to use for graph construction
- ivf_pq_paramscuvs.neighbors.ivf_pq.IndexParams, optional
IVF-PQ specific parameters (used when algo=”ivf_pq”)
- nn_descent_paramscuvs.neighbors.nn_descent.IndexParams, optional
NN-Descent specific parameters (used when algo=”nn_descent”)
- Attributes:
algo
Algorithm used for local k-NN graph building.
metric
Distance metric used for graph construction.
n_clusters
Number of clusters/batches to partition the dataset into.
overlap_factor
Number of clusters each point is assigned to.
Methods
get_handle
(self)Get a pointer to the underlying C object.
- algo#
Algorithm used for local k-NN graph building.
- metric#
Distance metric used for graph construction.
- n_clusters#
Number of clusters/batches to partition the dataset into.
- overlap_factor#
Number of clusters each point is assigned to.
Build#
- cuvs.neighbors.all_neighbors.build(
- dataset,
- k,
- params,
- *,
- indices=None,
- distances=None,
- core_distances=None,
- alpha=1.0,
- resources=None,
All-neighbors allows building an approximate all-neighbors knn graph. Given a full dataset, it finds nearest neighbors for all the training vectors in the dataset.
- Parameters:
- datasetarray_like
Training dataset to build the k-NN graph for. Can be provided on host (for multi-GPU build) or device (for single-GPU build). Host vs device location is automatically detected. Supported dtype: float32
- kint
Number of nearest neighbors to find for each point
- paramsAllNeighborsParams
Parameters object containing all build settings including algorithm choice and algorithm-specific parameters.
- indicesarray_like, optional
Optional output buffer for indices [num_rows x k] on device (int64). If not provided, will be allocated automatically.
- distancesarray_like, optional
Optional output buffer for distances [num_rows x k] on device (float32)
- core_distancesarray_like, optional
Optional output buffer for core distances [num_rows] on device (float32). Requires distances parameter to be provided.
- alphafloat, default=1.0
Mutual-reachability scaling; used only when core_distances is provided
- resourcesResources or MultiGpuResources, optional
CUDA resources to use for the operation. If not provided, a default Resources object will be created. Use MultiGpuResources to enable multi-GPU execution across multiple devices.
- Returns:
- indicesarray_like
k-NN indices for each point [num_rows x k], always on device. If indices buffer was provided, returns the same array filled with results.
- distancesarray_like or None
k-NN distances if distances buffer was provided, None otherwise
- core_distancesarray_like or None
Core distances if core_distances buffer was provided, None otherwise