HNSW#

This is a wrapper for hnswlib, to load a CAGRA index as an immutable HNSW index. The loaded HNSW index is only compatible in cuVS, and can be searched using wrapper functions.

Index search parameters#

class cuvs.neighbors.hnsw.SearchParams(ef=200, *, num_threads=0)#

HNSW search parameters

Parameters:
ef: int, default = 200

Maximum number of candidate list size used during search.

num_threads: int, default = 0

Number of CPU threads used to increase search parallelism. When set to 0, the number of threads is automatically determined using OpenMP’s omp_get_max_threads().

Attributes:
ef
num_threads

Index#

class cuvs.neighbors.hnsw.Index#

HNSW index object. This object stores the trained HNSW index state which can be used to perform nearest neighbors searches.

Attributes:
trained

Index Conversion#

cuvs.neighbors.hnsw.from_cagra(Index index, temporary_index_path=None, resources=None)[source]#

Returns an hnsw base-layer-only index from a CAGRA index.

NOTE: This method uses the filesystem to write the CAGRA index in

/tmp/<random_number>.bin or the parameter temporary_index_path if not None before reading it as an hnsw index, then deleting the temporary file. The returned index is immutable and can only be searched by the hnsw wrapper in cuVS, as the format is not compatible with the original hnswlib library. By base_layer_only, we mean that the hnsw index is created without the additional layers that are used for the hierarchical search in hnswlib. Instead, the base layer is used for the search.

Saving / loading the index is experimental. The serialization format is subject to change.

Parameters:
indexIndex

Trained CAGRA index.

temporary_index_pathstring, default = None

Path to save the temporary index file. If None, the temporary file will be saved in /tmp/<random_number>.bin.

resourcesOptional cuVS Resource handle for reusing CUDA resources.

If Resources aren’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If resources are supplied, you will need to explicitly synchronize yourself by calling resources.sync() before accessing the output.

Examples

>>> import cupy as cp
>>> from cuvs.neighbors import cagra
>>> from cuvs.neighbors import hnsw
>>> n_samples = 50000
>>> n_features = 50
>>> dataset = cp.random.random_sample((n_samples, n_features),
...                                   dtype=cp.float32)
>>> # Build index
>>> index = cagra.build(cagra.IndexParams(), dataset)
>>> # Serialize the CAGRA index to hnswlib base layer only index format
>>> hnsw_index = hnsw.from_cagra(index)