IVF-PQ#
Index build parameters#
- class cuvs.neighbors.ivf_pq.IndexParams(
- n_lists=1024,
- *,
- metric='sqeuclidean',
- metric_arg=2.0,
- kmeans_n_iters=20,
- kmeans_trainset_fraction=0.5,
- pq_bits=8,
- pq_dim=0,
- codebook_kind='subspace',
- force_random_rotation=False,
- add_data_on_build=True,
- conservative_memory_allocation=False,
- max_train_points_per_pq_code=256,
- codes_layout='interleaved',
Parameters to build index for IvfPq nearest neighbor search
- Parameters:
- n_listsint, default = 1024
The number of clusters used in the coarse quantizer.
- metricstr, default=”sqeuclidean”
String denoting the metric type. Valid values for metric: [“sqeuclidean”, “inner_product”, “euclidean”, “cosine”], where:
sqeuclidean is the euclidean distance without the square root operation, i.e.: distance(a,b) = sum_i (a_i - b_i)^2,
euclidean is the euclidean distance
inner product distance is defined as distance(a, b) = sum_i a_i * b_i.
cosine distance is defined as distance(a, b) = 1 - sum_i a_i * b_i / ( ||a||_2 * ||b||_2).
- kmeans_n_itersint, default = 20
The number of iterations searching for kmeans centers during index building.
- kmeans_trainset_fractionint, default = 0.5
If kmeans_trainset_fraction is less than 1, then the dataset is subsampled, and only n_samples * kmeans_trainset_fraction rows are used for training.
- pq_bitsint, default = 8
The bit length of the vector element after quantization.
- pq_dimint, default = 0
The dimensionality of a the vector after product quantization. When zero, an optimal value is selected using a heuristic. Note pq_dim * pq_bits must be a multiple of 8. Hint: a smaller ‘pq_dim’ results in a smaller index size and better search performance, but lower recall. If ‘pq_bits’ is 8, ‘pq_dim’ can be set to any number, but multiple of 8 are desirable for good performance. If ‘pq_bits’ is not 8, ‘pq_dim’ should be a multiple of 8. For good performance, it is desirable that ‘pq_dim’ is a multiple of 32. Ideally, ‘pq_dim’ should be also a divisor of the dataset dim.
- codebook_kindstring, default = “subspace”
Valid values [“subspace”, “cluster”]
- force_random_rotationbool, default = False
Apply a random rotation matrix on the input data and queries even if
dim % pq_dim == 0. Note: ifdimis not multiple ofpq_dim, a random rotation is always applied to the input data and queries to transform the working space fromdimtorot_dim, which may be slightly larger than the original space and and is a multiple ofpq_dim(rot_dim % pq_dim == 0). However, this transform is not necessary whendimis multiple ofpq_dim(dim == rot_dim, hence no need in adding “extra” data columns / features). By default, ifdim == rot_dim, the rotation transform is initialized with the identity matrix. Whenforce_random_rotation == True, a random orthogonal transform matrix is generated regardless of the values ofdimandpq_dim.- add_data_on_buildbool, default = True
After training the coarse and fine quantizers, we will populate the index with the dataset if add_data_on_build == True, otherwise the index is left empty, and the extend method can be used to add new vectors to the index.
- conservative_memory_allocationbool, default = True
By default, the algorithm allocates more space than necessary for individual clusters (
list_data). This allows to amortize the cost of memory allocation and reduce the number of data copies during repeated calls toextend(extending the database). To disable this behavior and use as little GPU memory for the database as possible, set this flat toTrue.- max_train_points_per_pq_codeint, default = 256
The max number of data points to use per PQ code during PQ codebook training. Using more data points per PQ code may increase the quality of PQ codebook but may also increase the build time. The parameter is applied to both PQ codebook generation methods, i.e., PER_SUBSPACE and PER_CLUSTER. In both cases, we will use pq_book_size * max_train_points_per_pq_code training points to train each codebook.
- codes_layoutstring, default = “interleaved”
Memory layout of the IVF-PQ list data. Valid values [“flat”, “interleaved”]
flat: Codes are stored contiguously, one vector’s codes after another.
interleaved: Codes are interleaved for optimized search performance. This is the default and recommended for search workloads.
- Attributes:
- add_data_on_build
- codebook_kind
- codes_layout
- conservative_memory_allocation
- force_random_rotation
- kmeans_n_iters
- kmeans_trainset_fraction
- max_train_points_per_pq_code
- metric
- metric_arg
- n_lists
- pq_bits
- pq_dim
Methods
get_handle(self)
Index search parameters#
- class cuvs.neighbors.ivf_pq.SearchParams(
- n_probes=20,
- *,
- lut_dtype=np.float32,
- internal_distance_dtype=np.float32,
- coarse_search_dtype=np.float32,
- max_internal_batch_size=4096,
Supplemental parameters to search IVF-Pq index
- Parameters:
- n_probes: int
The number of clusters to search.
- lut_dtype: default = np.float32
Data type of look up table to be created dynamically at search time. The use of low-precision types reduces the amount of shared memory required at search time, so fast shared memory kernels can be used even for datasets with large dimansionality. Note that the recall is slightly degraded when low-precision type is selected. Possible values [np.float32, np.float16, np.uint8]
- internal_distance_dtype: default = np.float32
Storage data type for distance/similarity computation. Possible values [np.float32, np.float16]
- coarse_search_dtype: default = np.float32
[Experimental] The data type to use as the GEMM element type when searching the clusters to probe. Possible values: [np.float32, np.float16, np.int8]. - Legacy default: np.float32 - Recommended for performance: np.float16 (half) - Experimental/low-precision: np.int8
- max_internal_batch_size: default = 4096
Set the internal batch size to improve GPU utilization at the cost of larger memory footprint.
- Attributes:
- coarse_search_dtype
- internal_distance_dtype
- lut_dtype
- max_internal_batch_size
- n_probes
Methods
get_handle(self)
Index#
- class cuvs.neighbors.ivf_pq.Index#
IvfPq index object. This object stores the trained IvfPq index state which can be used to perform nearest neighbors searches.
- Attributes:
centersGet the cluster centers corresponding to the lists in the
centers_paddedGet the padded cluster centers [n_lists, dim_ext] where dim_ext = round_up(dim + 1, 8).
centers_rotGet the rotated cluster centers [n_lists, rot_dim]
dimdimensionality of the cluster centers
list_sizesGet the sizes of each list
n_listsThe number of inverted lists (clusters)
pq_bitsThe bit length of an encoded vector element after compression by PQ.
pq_centersGet the PQ cluster centers
pq_dimThe dimensionality of an encoded vector after compression by PQ
pq_lenThe dimensionality of a subspace, i.e.
rotation_matrixGet the rotation matrix [rot_dim, dim]
- trained
Methods
list_data(self, label[, n_rows, offset, ...])Gets unpacked list data for a single list (cluster)
list_indices(self, label[, n_rows])Gets indices for a single cluster (list)
lists(self[, resources])Iterates through the pq-encoded list data
- centers#
Get the cluster centers corresponding to the lists in the original space
- centers_padded#
Get the padded cluster centers [n_lists, dim_ext] where dim_ext = round_up(dim + 1, 8). This returns contiguous data suitable for build_precomputed.
- centers_rot#
Get the rotated cluster centers [n_lists, rot_dim] where rot_dim = pq_len * pq_dim
- dim#
dimensionality of the cluster centers
- list_data(
- self,
- label,
- n_rows=0,
- offset=0,
- out_codes=None,
- resources=None,
Gets unpacked list data for a single list (cluster)
- Parameters:
- label, int:
The cluster to get data for
- n_rows, int:
The number of rows to return for the cluster (0 is all rows)
- offset, int:
The row to start getting data at
- out_codes, CAI
Optional buffer to hold memory. Will be created if None
- resourcesOptional cuVS Resource handle for reusing CUDA resources.
If Resources aren’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If resources are supplied, you will need to explicitly synchronize yourself by calling
resources.sync()before accessing the output.
- list_indices(self, label, n_rows=0)[source]#
Gets indices for a single cluster (list)
- Parameters:
- label, int:
The cluster to get data for
- n_rows, int, optional
Number of rows in the list
- list_sizes#
Get the sizes of each list
- lists(self, resources=None)[source]#
Iterates through the pq-encoded list data
This function returns an iterator over each list, with each value being the pq-encoded data for the entire list
- Parameters:
- resourcesOptional cuVS Resource handle for reusing CUDA resources.
If Resources aren’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If resources are supplied, you will need to explicitly synchronize yourself by calling
resources.sync()before accessing the output.
- n_lists#
The number of inverted lists (clusters)
- pq_bits#
The bit length of an encoded vector element after compression by PQ.
- pq_centers#
Get the PQ cluster centers
- pq_dim#
The dimensionality of an encoded vector after compression by PQ
- pq_len#
The dimensionality of a subspace, i.e. the number of vector components mapped to a subspace
- rotation_matrix#
Get the rotation matrix [rot_dim, dim] Transform matrix (original space -> rotated padded space)
Index build#
- cuvs.neighbors.ivf_pq.build(IndexParams index_params, dataset, resources=None)[source]#
Build the IvfPq index from the dataset for efficient search.
The input dataset array can be either CUDA array interface compliant matrix or an array interface compliant matrix in host memory.
- Parameters:
- index_params
cuvs.neighbors.ivf_pq.IndexParams Parameters on how to build the index
- datasetArray interface compliant matrix shape (n_samples, dim)
Supported dtype [float32, float16, int8, uint8]
- resourcesOptional cuVS Resource handle for reusing CUDA resources.
If Resources aren’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If resources are supplied, you will need to explicitly synchronize yourself by calling
resources.sync()before accessing the output.
- index_params
- Returns:
- index:
cuvs.neighbors.ivf_pq.Index
- index:
Examples
>>> import cupy as cp >>> from cuvs.neighbors import ivf_pq >>> n_samples = 50000 >>> n_features = 50 >>> n_queries = 1000 >>> k = 10 >>> dataset = cp.random.random_sample((n_samples, n_features), ... dtype=cp.float32) >>> build_params = ivf_pq.IndexParams(metric="sqeuclidean") >>> index = ivf_pq.build(build_params, dataset) >>> distances, neighbors = ivf_pq.search(ivf_pq.SearchParams(), ... index, dataset, ... k) >>> distances = cp.asarray(distances) >>> neighbors = cp.asarray(neighbors)
Index search#
- cuvs.neighbors.ivf_pq.search(
- SearchParams search_params,
- Index index,
- queries,
- k,
- neighbors=None,
- distances=None,
- resources=None,
Find the k nearest neighbors for each query.
- Parameters:
- search_params
cuvs.neighbors.ivf_pq.SearchParams Parameters on how to search the index
- index
cuvs.neighbors.ivf_pq.Index Trained IvfPq index.
- queriesCUDA array interface compliant matrix shape (n_samples, dim)
Supported dtype [float, int8, uint8]
- kint
The number of neighbors.
- neighborsOptional CUDA array interface compliant matrix shape
(n_queries, k), dtype int64_t. If supplied, neighbor indices will be written here in-place. (default None)
- distancesOptional CUDA array interface compliant matrix shape
(n_queries, k) If supplied, the distances to the neighbors will be written here in-place. (default None)
- resourcesOptional cuVS Resource handle for reusing CUDA resources.
If Resources aren’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If resources are supplied, you will need to explicitly synchronize yourself by calling
resources.sync()before accessing the output.
- search_params
Examples
>>> import cupy as cp >>> from cuvs.neighbors import ivf_pq >>> n_samples = 50000 >>> n_features = 50 >>> n_queries = 1000 >>> dataset = cp.random.random_sample((n_samples, n_features), ... dtype=cp.float32) >>> # Build the index >>> index = ivf_pq.build(ivf_pq.IndexParams(), dataset) >>> >>> # Search using the built index >>> queries = cp.random.random_sample((n_queries, n_features), ... dtype=cp.float32) >>> k = 10 >>> search_params = ivf_pq.SearchParams(n_probes=20) >>> >>> distances, neighbors = ivf_pq.search(search_params, index, queries, ... k)
Index save#
- cuvs.neighbors.ivf_pq.save(filename, Index index, bool include_dataset=True, resources=None)[source]#
Saves the index to a file.
Saving / loading the index is experimental. The serialization format is subject to change.
- Parameters:
- filenamestring
Name of the file.
- indexIndex
Trained IVF-PQ index.
- resourcesOptional cuVS Resource handle for reusing CUDA resources.
If Resources aren’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If resources are supplied, you will need to explicitly synchronize yourself by calling
resources.sync()before accessing the output.
Examples
>>> import cupy as cp >>> from cuvs.neighbors import ivf_pq >>> n_samples = 50000 >>> n_features = 50 >>> dataset = cp.random.random_sample((n_samples, n_features), ... dtype=cp.float32) >>> # Build index >>> index = ivf_pq.build(ivf_pq.IndexParams(), dataset) >>> # Serialize and deserialize the ivf_pq index built >>> ivf_pq.save("my_index.bin", index) >>> index_loaded = ivf_pq.load("my_index.bin")
Index load#
- cuvs.neighbors.ivf_pq.load(filename, resources=None)[source]#
Loads index from file.
Saving / loading the index is experimental. The serialization format is subject to change, therefore loading an index saved with a previous version of cuvs is not guaranteed to work.
- Parameters:
- filenamestring
Name of the file.
- resourcesOptional cuVS Resource handle for reusing CUDA resources.
If Resources aren’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If resources are supplied, you will need to explicitly synchronize yourself by calling
resources.sync()before accessing the output.
- Returns:
- indexIndex
Index extend#
- cuvs.neighbors.ivf_pq.extend(Index index, new_vectors, new_indices, resources=None)[source]#
Extend an existing index with new vectors.
The input array can be either CUDA array interface compliant matrix or array interface compliant matrix in host memory.
- Parameters:
- indexivf_pq.Index
Trained ivf_pq object.
- new_vectorsarray interface compliant matrix shape (n_samples, dim)
Supported dtype [float, int8, uint8]
- new_indicesarray interface compliant vector shape (n_samples)
Supported dtype [int64]
- resourcesOptional cuVS Resource handle for reusing CUDA resources.
If Resources aren’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If resources are supplied, you will need to explicitly synchronize yourself by calling
resources.sync()before accessing the output.
- Returns:
- index: py:class:
cuvs.neighbors.ivf_pq.Index
- index: py:class:
Examples
>>> import cupy as cp >>> from cuvs.neighbors import ivf_pq >>> n_samples = 50000 >>> n_features = 50 >>> n_queries = 1000 >>> dataset = cp.random.random_sample((n_samples, n_features), ... dtype=cp.float32) >>> index = ivf_pq.build(ivf_pq.IndexParams(), dataset) >>> n_rows = 100 >>> more_data = cp.random.random_sample((n_rows, n_features), ... dtype=cp.float32) >>> indices = n_samples + cp.arange(n_rows, dtype=cp.int64) >>> index = ivf_pq.extend(index, more_data, indices) >>> # Search using the built index >>> queries = cp.random.random_sample((n_queries, n_features), ... dtype=cp.float32) >>> distances, neighbors = ivf_pq.search(ivf_pq.SearchParams(), ... index, queries, ... k=10)