IVF-PQ#
The IVF-PQ method is an ANN algorithm. Like IVF-Flat, IVF-PQ splits the points into a number of clusters (also specified by a parameter called n_lists) and searches the closest clusters to compute the nearest neighbors (also specified by a parameter called n_probes), but it shrinks the sizes of the vectors using a technique called product quantization.
#include <cuvs/neighbors/ivf_pq.h>
Index build parameters#
-
enum cuvsIvfPqCodebookGen#
A type for specifying how PQ codebooks are created.
Values:
-
enumerator CUVS_IVF_PQ_CODEBOOK_GEN_PER_SUBSPACE#
-
enumerator CUVS_IVF_PQ_CODEBOOK_GEN_PER_CLUSTER#
-
enumerator CUVS_IVF_PQ_CODEBOOK_GEN_PER_SUBSPACE#
-
enum cuvsIvfPqListLayout#
A type for specifying the memory layout of IVF-PQ list data.
Values:
-
enumerator CUVS_IVF_PQ_LIST_LAYOUT_FLAT#
-
enumerator CUVS_IVF_PQ_LIST_LAYOUT_INTERLEAVED#
-
enumerator CUVS_IVF_PQ_LIST_LAYOUT_FLAT#
-
typedef struct cuvsIvfPqIndexParams *cuvsIvfPqIndexParams_t#
- cuvsError_t cuvsIvfPqIndexParamsCreate(
- cuvsIvfPqIndexParams_t *index_params
Allocate IVF-PQ Index params, and populate with default values.
- Parameters:
index_params – [in] cuvsIvfPqIndexParams_t to allocate
- Returns:
- cuvsError_t cuvsIvfPqIndexParamsDestroy(
- cuvsIvfPqIndexParams_t index_params
De-allocate IVF-PQ Index params.
- Parameters:
index_params – [in]
- Returns:
-
struct cuvsIvfPqIndexParams#
- #include <ivf_pq.h>
Supplemental parameters to build IVF-PQ Index.
Public Members
-
cuvsDistanceType metric#
Distance type.
-
float metric_arg#
The argument used by some distance metrics.
-
bool add_data_on_build#
Whether to add the dataset content to the index, i.e.:
truemeans the index is filled with the dataset vectors and ready to search after callingbuild.falsemeansbuildonly trains the underlying model (e.g. quantizer or clustering), but the index is left empty; you’d need to callextendon the index afterwards to populate it.
-
uint32_t n_lists#
The number of inverted lists (clusters)
Hint: the number of vectors per cluster (
n_rows/n_lists) should be approximately 1,000 to 10,000.
-
uint32_t kmeans_n_iters#
The number of iterations searching for kmeans centers (index building).
-
double kmeans_trainset_fraction#
The fraction of data to use during iterative kmeans building.
-
uint32_t pq_bits#
The bit length of the vector element after compression by PQ.
Possible values: [4, 5, 6, 7, 8].
Hint: the smaller the ‘pq_bits’, the smaller the index size and the better the search performance, but the lower the recall.
-
uint32_t pq_dim#
The dimensionality of the vector after compression by PQ. When zero, an optimal value is selected using a heuristic.
NB:
pq_dim * pq_bitsmust be a multiple of 8.Hint: a smaller ‘pq_dim’ results in a smaller index size and better search performance, but lower recall. If ‘pq_bits’ is 8, ‘pq_dim’ can be set to any number, but multiple of 8 are desirable for good performance. If ‘pq_bits’ is not 8, ‘pq_dim’ should be a multiple of 8. For good performance, it is desirable that ‘pq_dim’ is a multiple of 32. Ideally, ‘pq_dim’ should be also a divisor of the dataset dim.
-
enum cuvsIvfPqCodebookGen codebook_kind#
How PQ codebooks are created.
-
bool force_random_rotation#
Apply a random rotation matrix on the input data and queries even if
dim % pq_dim == 0.Note: if
dimis not multiple ofpq_dim, a random rotation is always applied to the input data and queries to transform the working space fromdimtorot_dim, which may be slightly larger than the original space and and is a multiple ofpq_dim(rot_dim % pq_dim == 0). However, this transform is not necessary whendimis multiple ofpq_dim(dim == rot_dim, hence no need in adding “extra” data columns / features).By default, if
dim == rot_dim, the rotation transform is initialized with the identity matrix. Whenforce_random_rotation == true, a random orthogonal transform matrix is generated regardless of the values ofdimandpq_dim.
-
bool conservative_memory_allocation#
By default, the algorithm allocates more space than necessary for individual clusters (
list_data). This allows to amortize the cost of memory allocation and reduce the number of data copies during repeated calls toextend(extending the database).The alternative is the conservative allocation behavior; when enabled, the algorithm always allocates the minimum amount of memory required to store the given number of records. Set this flag to
trueif you prefer to use as little GPU memory for the database as possible.
-
uint32_t max_train_points_per_pq_code#
The max number of data points to use per PQ code during PQ codebook training. Using more data points per PQ code may increase the quality of PQ codebook but may also increase the build time. The parameter is applied to both PQ codebook generation methods, i.e., PER_SUBSPACE and PER_CLUSTER. In both cases, we will use
pq_book_size * max_train_points_per_pq_codetraining points to train each codebook.
-
enum cuvsIvfPqListLayout codes_layout#
Memory layout of the IVF-PQ list data.
CUVS_IVF_PQ_LIST_LAYOUT_FLAT: Codes are stored contiguously, one vector’s codes after another.
CUVS_IVF_PQ_LIST_LAYOUT_INTERLEAVED: Codes are interleaved for optimized search performance. This is the default and recommended for search workloads.
-
cuvsDistanceType metric#
Index search parameters#
-
typedef struct cuvsIvfPqSearchParams *cuvsIvfPqSearchParams_t#
- cuvsError_t cuvsIvfPqSearchParamsCreate(
- cuvsIvfPqSearchParams_t *params
Allocate IVF-PQ search params, and populate with default values.
- Parameters:
params – [in] cuvsIvfPqSearchParams_t to allocate
- Returns:
- cuvsError_t cuvsIvfPqSearchParamsDestroy(
- cuvsIvfPqSearchParams_t params
De-allocate IVF-PQ search params.
- Parameters:
params – [in]
- Returns:
-
struct cuvsIvfPqSearchParams#
- #include <ivf_pq.h>
Supplemental parameters to search IVF-PQ index.
Public Members
-
uint32_t n_probes#
The number of clusters to search.
-
cudaDataType_t lut_dtype#
Data type of look up table to be created dynamically at search time.
Possible values: [CUDA_R_32F, CUDA_R_16F, CUDA_R_8U]
The use of low-precision types reduces the amount of shared memory required at search time, so fast shared memory kernels can be used even for datasets with large dimansionality. Note that the recall is slightly degraded when low-precision type is selected.
-
cudaDataType_t internal_distance_dtype#
Storage data type for distance/similarity computed at search time.
Possible values: [CUDA_R_16F, CUDA_R_32F]
If the performance limiter at search time is device memory access, selecting FP16 will improve performance slightly.
-
cudaDataType_t coarse_search_dtype#
The data type to use as the GEMM element type when searching the clusters to probe.
Possible values: [CUDA_R_8I, CUDA_R_16F, CUDA_R_32F].
Legacy default: CUDA_R_32F (float)
Recommended for performance: CUDA_R_16F (half)
Experimental/low-precision: CUDA_R_8I (int8_t) (WARNING: int8_t variant degrades recall unless data is normalized and low-dimensional)
-
uint32_t max_internal_batch_size#
Set the internal batch size to improve GPU utilization at the cost of larger memory footprint.
-
double preferred_shmem_carveout#
Preferred fraction of SM’s unified memory / L1 cache to be used as shared memory.
Possible values: [0.0 - 1.0] as a fraction of the
sharedMemPerMultiprocessor.One wants to increase the carveout to make sure a good GPU occupancy for the main search kernel, but not to keep it too high to leave some memory to be used as L1 cache. Note, this value is interpreted only as a hint. Moreover, a GPU usually allows only a fixed set of cache configurations, so the provided value is rounded up to the nearest configuration. Refer to the NVIDIA tuning guide for the target GPU architecture.
Note, this is a low-level tuning parameter that can have drastic negative effects on the search performance if tweaked incorrectly.
-
uint32_t n_probes#
Index#
-
typedef cuvsIvfPqIndex *cuvsIvfPqIndex_t#
-
cuvsError_t cuvsIvfPqIndexCreate(cuvsIvfPqIndex_t *index)#
Allocate IVF-PQ index.
- Parameters:
index – [in] cuvsIvfPqIndex_t to allocate
- Returns:
-
cuvsError_t cuvsIvfPqIndexDestroy(cuvsIvfPqIndex_t index)#
De-allocate IVF-PQ index.
- Parameters:
index – [in] cuvsIvfPqIndex_t to de-allocate
- cuvsError_t cuvsIvfPqIndexGetNLists(
- cuvsIvfPqIndex_t index,
- int64_t *n_lists
Get the number of clusters/inverted lists
- cuvsError_t cuvsIvfPqIndexGetDim(
- cuvsIvfPqIndex_t index,
- int64_t *dim
Get the dimensionality
- cuvsError_t cuvsIvfPqIndexGetSize(
- cuvsIvfPqIndex_t index,
- int64_t *size
Get the size of the index
- cuvsError_t cuvsIvfPqIndexGetPqDim(
- cuvsIvfPqIndex_t index,
- int64_t *pq_dim
Get the dimensionality of an encoded vector after compression by PQ.
- cuvsError_t cuvsIvfPqIndexGetPqBits(
- cuvsIvfPqIndex_t index,
- int64_t *pq_bits
Get the bit length of an encoded vector element after compression by PQ.
- cuvsError_t cuvsIvfPqIndexGetPqLen(
- cuvsIvfPqIndex_t index,
- int64_t *pq_len
Get the Dimensionality of a subspace, i.e. the number of vector components mapped to a subspace
- cuvsError_t cuvsIvfPqIndexGetCenters(
- cuvsIvfPqIndex_t index,
- DLManagedTensor *centers
Get the cluster centers corresponding to the lists in the original space.
- Parameters:
index – [in] cuvsIvfPqIndex_t Built Ivf-Pq index
centers – [out] Output tensor that will be populated with a non-owning view of the data
- Returns:
- cuvsError_t cuvsIvfPqIndexGetCentersPadded(
- cuvsIvfPqIndex_t index,
- DLManagedTensor *centers
Get the padded cluster centers [n_lists, dim_ext] where dim_ext = round_up(dim + 1, 8)
This returns the full padded centers as a contiguous array, suitable for use with cuvsIvfPqBuildPrecomputed.
- Parameters:
index – [in] cuvsIvfPqIndex_t Built Ivf-Pq index
centers – [out] Output tensor that will be populated with a non-owning view of the data
- Returns:
- cuvsError_t cuvsIvfPqIndexGetPqCenters(
- cuvsIvfPqIndex_t index,
- DLManagedTensor *pq_centers
Get the PQ cluster centers.
CUVS_IVF_PQ_CODEBOOK_GEN_PER_SUBSPACE: [pq_dim , pq_len, pq_book_size]
CUVS_IVF_PQ_CODEBOOK_GEN_PER_CLUSTER: [n_lists, pq_len, pq_book_size]
- Parameters:
index – [in] cuvsIvfPqIndex_t Built Ivf-Pq index
pq_centers – [out] Output tensor that will be populated with a non-owning view of the data
- Returns:
- cuvsError_t cuvsIvfPqIndexGetCentersRot(
- cuvsIvfPqIndex_t index,
- DLManagedTensor *centers_rot
Get the rotated cluster centers [n_lists, rot_dim] where rot_dim = pq_len * pq_dim.
- Parameters:
index – [in] cuvsIvfPqIndex_t Built Ivf-Pq index
centers_rot – [out] Output tensor that will be populated with a non-owning view of the data
- Returns:
- cuvsError_t cuvsIvfPqIndexGetRotationMatrix(
- cuvsIvfPqIndex_t index,
- DLManagedTensor *rotation_matrix
Get the rotation matrix [rot_dim, dim] Transform matrix (original space -> rotated padded space)
- Parameters:
index – [in] cuvsIvfPqIndex_t Built Ivf-Pq index
rotation_matrix – [out] Output tensor that will be populated with a non-owning view of the data
- Returns:
- cuvsError_t cuvsIvfPqIndexGetListSizes(
- cuvsIvfPqIndex_t index,
- DLManagedTensor *list_sizes
Get the sizes of each list.
- Parameters:
index – [in] cuvsIvfPqIndex_t Built Ivf-Pq index
list_sizes – [out] Output tensor that will be populated with a non-owning view of the data
- Returns:
- cuvsError_t cuvsIvfPqIndexUnpackContiguousListData(
- cuvsResources_t res,
- cuvsIvfPqIndex_t index,
- DLManagedTensor *out_codes,
- uint32_t label,
- uint32_t offset
Unpack
n_rowsconsecutive PQ encoded vectors of a single list (cluster) in the compressed index starting at givenoffset, not expanded to one code per byte. Each code in the output buffer occupies ceildiv(index.pq_dim() * index.pq_bits(), 8) bytes.- Parameters:
res – [in] raft resource
index – [in] cuvsIvfPqIndex_t Built Ivf-Pq index
out_codes – [out] the destination buffer [n_rows, ceildiv(index.pq_dim() * index.pq_bits(), 8)]. The length
n_rowsdefines how many records to unpack, offset + n_rows must be smaller than or equal to the list size. This DLManagedTensor must already point to allocated device memorylabel – [in] The id of the list (cluster) to decode.
offset – [in] How many records in the list to skip.
- cuvsError_t cuvsIvfPqIndexGetListIndices(
- cuvsIvfPqIndex_t index,
- uint32_t label,
- DLManagedTensor *out_labels
Get the indices of each vector in a ivf-pq list.
- Parameters:
index – [in] cuvsIvfPqIndex_t Built Ivf-Pq index
label – [in] The id of the list (cluster) to decode.
out_labels – [out] output tensor that will be populated with a non-owning view of the data
- Returns:
-
struct cuvsIvfPqIndex#
- #include <ivf_pq.h>
Struct to hold address of cuvs::neighbors::ivf_pq::index and its active trained dtype.
Index build#
- cuvsError_t cuvsIvfPqBuild(
- cuvsResources_t res,
- cuvsIvfPqIndexParams_t params,
- DLManagedTensor *dataset,
- cuvsIvfPqIndex_t index
Build a IVF-PQ index with a
DLManagedTensorwhich has underlyingDLDeviceTypeequal tokDLCUDA,kDLCUDAHost,kDLCUDAManaged, orkDLCPU. Also, acceptable underlying types are:kDLDataType.code == kDLFloatandkDLDataType.bits = 32kDLDataType.code == kDLFloatandkDLDataType.bits = 16kDLDataType.code == kDLIntandkDLDataType.bits = 8kDLDataType.code == kDLUIntandkDLDataType.bits = 8
#include <cuvs/core/c_api.h> #include <cuvs/neighbors/ivf_pq.h> // Create cuvsResources_t cuvsResources_t res; cuvsError_t res_create_status = cuvsResourcesCreate(&res); // Assume a populated `DLManagedTensor` type here DLManagedTensor dataset; // Create default index params cuvsIvfPqIndexParams_t index_params; cuvsError_t params_create_status = cuvsIvfPqIndexParamsCreate(&index_params); // Create IVF-PQ index cuvsIvfPqIndex_t index; cuvsError_t index_create_status = cuvsIvfPqIndexCreate(&index); // Build the IVF-PQ Index cuvsError_t build_status = cuvsIvfPqBuild(res, index_params, &dataset, index); // de-allocate `index_params`, `index` and `res` cuvsError_t params_destroy_status = cuvsIvfPqIndexParamsDestroy(index_params); cuvsError_t index_destroy_status = cuvsIvfPqIndexDestroy(index); cuvsError_t res_destroy_status = cuvsResourcesDestroy(res);
- Parameters:
res – [in] cuvsResources_t opaque C handle
params – [in] cuvsIvfPqIndexParams_t used to build IVF-PQ index
dataset – [in] DLManagedTensor* training dataset
index – [out] cuvsIvfPqIndex_t Newly built IVF-PQ index
- Returns:
- cuvsError_t cuvsIvfPqBuildPrecomputed(
- cuvsResources_t res,
- cuvsIvfPqIndexParams_t params,
- uint32_t dim,
- DLManagedTensor *pq_centers,
- DLManagedTensor *centers,
- DLManagedTensor *centers_rot,
- DLManagedTensor *rotation_matrix,
- cuvsIvfPqIndex_t index
Build a view-type IVF-PQ index from device memory precomputed centroids and codebook.
This function creates a non-owning index that stores a reference to the provided device data. All parameters must be provided with correct extents. The caller is responsible for ensuring the lifetime of the input data exceeds the lifetime of the returned index.
The index_params must be consistent with the provided matrices. Specifically:
index_params.codebook_kind determines the expected shape of pq_centers
index_params.metric will be stored in the index
index_params.conservative_memory_allocation will be stored in the index The function will verify consistency between index_params, dim, and the matrix extents.
- Parameters:
res – [in] cuvsResources_t opaque C handle
params – [in] cuvsIvfPqIndexParams_t used to configure the index (must be consistent with matrices)
dim – [in] dimensionality of the input data
pq_centers – [in] PQ codebook on device memory with required shape:
codebook_kind CUVS_IVF_PQ_CODEBOOK_GEN_PER_SUBSPACE: [pq_dim, pq_len, pq_book_size]
codebook_kind CUVS_IVF_PQ_CODEBOOK_GEN_PER_CLUSTER: [n_lists, pq_len, pq_book_size]
centers – [in] Cluster centers in the original space [n_lists, dim_ext] where dim_ext = round_up(dim + 1, 8)
centers_rot – [in] Rotated cluster centers [n_lists, rot_dim] where rot_dim = pq_len * pq_dim
rotation_matrix – [in] Transform matrix (original space -> rotated padded space) [rot_dim, dim]
index – [out] cuvsIvfPqIndex_t Newly built view-type IVF-PQ index
- Returns:
Index search#
- cuvsError_t cuvsIvfPqSearch(
- cuvsResources_t res,
- cuvsIvfPqSearchParams_t search_params,
- cuvsIvfPqIndex_t index,
- DLManagedTensor *queries,
- DLManagedTensor *neighbors,
- DLManagedTensor *distances
Search a IVF-PQ index with a
DLManagedTensorwhich has underlyingDLDeviceTypeequal tokDLCUDA,kDLCUDAHost,kDLCUDAManaged. It is also important to note that the IVF-PQ Index must have been built with the same type ofqueries, such thatindex.dtype.code == queries.dl_tensor.dtype.codeTypes for input are:queries:kDLDataType.code == kDLFloatandkDLDataType.bits = 32orkDLDataType.bits = 16neighbors:kDLDataType.code == kDLUIntandkDLDataType.bits = 32distances:kDLDataType.code == kDLFloatandkDLDataType.bits = 32
#include <cuvs/core/c_api.h> #include <cuvs/neighbors/ivf_pq.h> // Create cuvsResources_t cuvsResources_t res; cuvsError_t res_create_status = cuvsResourcesCreate(&res); // Assume a populated `DLManagedTensor` type here DLManagedTensor dataset; DLManagedTensor queries; DLManagedTensor neighbors; // Create default search params cuvsIvfPqSearchParams_t search_params; cuvsError_t params_create_status = cuvsIvfPqSearchParamsCreate(&search_params); // Search the `index` built using `cuvsIvfPqBuild` cuvsError_t search_status = cuvsIvfPqSearch(res, search_params, index, &queries, &neighbors, &distances); // de-allocate `search_params` and `res` cuvsError_t params_destroy_status = cuvsIvfPqSearchParamsDestroy(search_params); cuvsError_t res_destroy_status = cuvsResourcesDestroy(res);
- Parameters:
res – [in] cuvsResources_t opaque C handle
search_params – [in] cuvsIvfPqSearchParams_t used to search IVF-PQ index
index – [in] cuvsIvfPqIndex which has been returned by
cuvsIvfPqBuildqueries – [in] DLManagedTensor* queries dataset to search
neighbors – [out] DLManagedTensor* output
kneighbors for queriesdistances – [out] DLManagedTensor* output
kdistances for queries
Index serialize#
- cuvsError_t cuvsIvfPqSerialize(
- cuvsResources_t res,
- const char *filename,
- cuvsIvfPqIndex_t index
Save the index to file.
Experimental, both the API and the serialization format are subject to change.
#include <cuvs/neighbors/ivf_pq.h> // Create cuvsResources_t cuvsResources_t res; cuvsError_t res_create_status = cuvsResourcesCreate(&res); // create an index with `cuvsIvfPqBuild` cuvsIvfPqSerialize(res, "/path/to/index", index, true);
- Parameters:
res – [in] cuvsResources_t opaque C handle
filename – [in] the file name for saving the index
index – [in] IVF-PQ index
- cuvsError_t cuvsIvfPqDeserialize(
- cuvsResources_t res,
- const char *filename,
- cuvsIvfPqIndex_t index
Load index from file.
Experimental, both the API and the serialization format are subject to change.
- Parameters:
res – [in] cuvsResources_t opaque C handle
filename – [in] the name of the file that stores the index
index – [out] IVF-PQ index loaded disk