NN-Descent#

Index build parameters#

class cuvs.neighbors.nn_descent.IndexParams(
metric=None,
*,
metric_arg=None,
graph_degree=None,
intermediate_graph_degree=None,
max_iterations=None,
termination_threshold=None,
n_clusters=None,
)#

Parameters to build NN-Descent Index

Parameters:
metricstr, default = “sqeuclidean”

String denoting the metric type. distribution of the newly added data.

graph_degreeint

For an input dataset of dimensions (N, D), determines the final dimensions of the all-neighbors knn graph which turns out to be of dimensions (N, graph_degree)

intermediate_graph_degreeint

Internally, nn-descent builds an all-neighbors knn graph of dimensions (N, intermediate_graph_degree) before selecting the final graph_degree neighbors. It’s recommended that intermediate_graph_degree >= 1.5 * graph_degree

max_iterationsint

The number of iterations that nn-descent will refine the graph for. More iterations produce a better quality graph at cost of performance

termination_thresholdfloat

The delta at which nn-descent will terminate its iterations

Attributes:
graph_degree
intermediate_graph_degree
max_iterations
metric
metric_arg
n_clusters
termination_threshold

Index#

class cuvs.neighbors.nn_descent.Index#

NN-Descent index object. This object stores the trained NN-Descent index, which can be used to get the NN-Descent graph and distances after building

Attributes:
graph
trained

Index build#

cuvs.neighbors.nn_descent.build(IndexParams index_params, dataset, graph=None, resources=None)[source]#

Build KNN graph from the dataset

Parameters:
index_paramscuvs.neighbors.nn_descent.IndexParams
datasetArray interface compliant matrix, on either host or device memory

Supported dtype [float, int8, uint8]

graphOptional host matrix for storing output graph
resourcesOptional cuVS Resource handle for reusing CUDA resources.

If Resources aren’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If resources are supplied, you will need to explicitly synchronize yourself by calling resources.sync() before accessing the output.

Returns:
index: py:class:cuvs.neighbors.nn_descent.Index

Examples

>>> import cupy as cp
>>> from cuvs.neighbors import nn_descent
>>> n_samples = 50000
>>> n_features = 50
>>> n_queries = 1000
>>> k = 10
>>> dataset = cp.random.random_sample((n_samples, n_features),
...                                   dtype=cp.float32)
>>> build_params = nn_descent.IndexParams(metric="sqeuclidean")
>>> index = nn_descent.build(build_params, dataset)
>>> graph = index.graph