Vamana#

VAMANA is the underlying graph construction algorithm used to construct indexes for the DiskANN vector search solution. DiskANN and the Vamana algorithm are described in detail in the published paper, and a highly optimized open-source repository includes many features for index construction and search. In cuVS, we provide a version of the Vamana algorithm optimized for GPU architectures to accelreate graph construction to build DiskANN idnexes. At a high level, the Vamana algorithm operates as follows:

    1. Starting with an empty graph, select a medoid vector from the D-dimension vector dataset and insert it into the graph.

    1. Iteratively insert batches of dataset vectors into the graph, connecting each inserted vector to neighbors based on a graph traversal.

    1. For each batch, create reverse edges and prune unnecessary edges.

There are many algorithmic details that are outlined in the paper, and many GPU-specific optimizations are included in this implementation.

The current implementation of DiskANN in cuVS only includes the ‘in-memory’ graph construction and a serialization step that writes the index to a file. This index file can be then used by the open-source DiskANN library to perform efficient search. Additional DiskANN functionality, including GPU-accelerated search and ‘ssd’ index build are planned for future cuVS releases.

[ C++ API | Python API ]

Interoperability with CPU DiskANN#

The ‘vamana::serialize’ API calls writes the index to a file with a format that is compatible with the open-source DiskANN repositoriy. This allows cuVS to be used to accelerate index construction while leveraging the efficient CPU-based search currently available.

Configuration parameters#

Build parameters#

Name

Default

Description

graph_degree

32

The maximum degre of the final Vamana graph. The internal representation of the graph includes this many edges for every node, but serialize will compress the graph into a ‘CSR’ format with, potentially, fewer edges.

visited_size

64

Maximum number of visited nodes saved during each traversal to insert a new node. This corresponds to the ‘L’ parameter in the paper.

vamana_iters

1

Number of iterations ran to improve the graph. Each iteration involves inserting every vector in the dataset.

alpha

1.2

Alpha parameter that defines how aggressively to prune edges.

max_fraction

0.06

Maximum fraction of the dataset that will be inserted as a single batch. Larger max batch size decreases graph quality but improves speed.

batch_base

2

Base of growth rate of batch sizes. Insertion batch sizes increase exponentially based on this parameter until max_fraction is reached.

queue_size

127

Size of the candidate queue structure used during graph traversal. Must be (2^x)-1 for some x, and must be > visited_size.

Tuning Considerations#

The 2 hyper-parameters that are most often tuned are graph_degree and visited_size. The time needed to create a graph increases dramatically when increasing graph_degree, in particular. However, larger graphs may be needed to achieve very high recall search, especially for large datasets.

Memory footprint#

Vamana builds a graph that is stored in device memory. However, in order to serialize the index and write it to a file for later use, it must be moved into host memory. If the include_dataset parameter is also set, then the dataset must be resident in host memory when calling serialize as well.

Device memory usage#

The built index represents the graph as fixed degree, storing a total of \(graph\_degree * n\_index\_vectors\) edges. Graph construction also requires the dataset be in device memory (or it copies it to device during build). In addition, device memory is used during construction to sort and create the reverse edges. Thus, the amount of device memory needed depends on the dataset itself, but it is bounded by a maximum sum of:

  • vector dataset: \(n\_index\_vectors * n\_dims * sizeof(T)\)

  • output graph: \(graph\_degree * n\_index\_vectors * sizeof(IdxT)\)

  • scratch memory: \(n\_index\_vectors * max\_fraction * (2 + graph\_degree) * sizeof(IdxT)\)

Reduction in scratch device memory requirements are planned for upcoming releases of cuVS.