Interoperability#

DLPack (C)#

Approximate nearest neighbor (ANN) indexes provide an interface to build and search an index via a C API. [DLPack v0.8](dmlc/dlpack), a tensor interface framework, is used as the standard to interact with our C API.

Representing a tensor with DLPack is simple, as it is a POD struct that stores information about the tensor at runtime. At the moment, DLManagedTensor from DLPack v0.8 is compatible with out C API however we will soon upgrade to DLManagedTensorVersioned from DLPack v1.0 as it will help us maintain ABI and API compatibility.

Here’s an example on how to represent device memory using DLManagedTensor:

#include <dlpack/dlpack.h>

// Create data representation in host memory
float dataset[2][1] = {{0.2, 0.1}};
// copy data to device memory
float *dataset_dev;
cuvsRMMAlloc(&dataset_dev, sizeof(float) * 2 * 1);
cudaMemcpy(dataset_dev, dataset, sizeof(float) * 2 * 1, cudaMemcpyDefault);

// Use DLPack for representing the data as a tensor
DLManagedTensor dataset_tensor;
dataset_tensor.dl_tensor.data               = dataset;
dataset_tensor.dl_tensor.device.device_type = kDLCUDA;
dataset_tensor.dl_tensor.ndim               = 2;
dataset_tensor.dl_tensor.dtype.code         = kDLFloat;
dataset_tensor.dl_tensor.dtype.bits         = 32;
dataset_tensor.dl_tensor.dtype.lanes        = 1;
int64_t dataset_shape[2]                    = {2, 1};
dataset_tensor.dl_tensor.shape              = dataset_shape;
dataset_tensor.dl_tensor.strides            = nullptr;

// free memory after use
cuvsRMMFree(dataset_dev);

Please refer to cuVS C API documentation to learn more.

Multi-dimensional span (C++)#

cuVS is built on top of the GPU-accelerated machine learning and data mining primitives in the RAFT library. Most of the C++ APIs in cuVS accept mdspan multi-dimensional array view for representing data in higher dimensions similar to the ndarray in the Numpy Python library. RAFT also contains the corresponding owning mdarray structure, which simplifies the allocation and management of multi-dimensional data in both host and device (GPU) memory.

The mdarray is an owning object that forms a convenience layer over RMM and can be constructed in RAFT using a number of different helper functions:

#include <raft/core/device_mdarray.hpp>

int n_rows = 10;
int n_cols = 10;

auto scalar = raft::make_device_scalar<float>(handle, 1.0);
auto vector = raft::make_device_vector<float>(handle, n_cols);
auto matrix = raft::make_device_matrix<float>(handle, n_rows, n_cols);

The mdspan is a lightweight non-owning view that can wrap around any pointer, maintaining shape, layout, and indexing information for accessing elements.

We can construct mdspan instances directly from the above mdarray instances:

// Scalar mdspan on device
auto scalar_view = scalar.view();

// Vector mdspan on device
auto vector_view = vector.view();

// Matrix mdspan on device
auto matrix_view = matrix.view();

Since the mdspan is just a lightweight wrapper, we can also construct it from the underlying data handles in the mdarray instances above. We use the extent to get information about the mdarray or mdspan’s shape.

#include <raft/core/device_mdspan.hpp>

auto scalar_view = raft::make_device_scalar_view(scalar.data_handle());
auto vector_view = raft::make_device_vector_view(vector.data_handle(), vector.extent(0));
auto matrix_view = raft::make_device_matrix_view(matrix.data_handle(), matrix.extent(0), matrix.extent(1));

Of course, RAFT’s mdspan/mdarray APIs aren’t just limited to the device. You can also create host variants:

#include <raft/core/host_mdarray.hpp>
#include <raft/core/host_mdspan.hpp>

int n_rows = 10;
int n_cols = 10;

auto scalar = raft::make_host_scalar<float>(handle, 1.0);
auto vector = raft::make_host_vector<float>(handle, n_cols);
auto matrix = raft::make_host_matrix<float>(handle, n_rows, n_cols);

auto scalar_view = raft::make_host_scalar_view(scalar.data_handle());
auto vector_view = raft::make_host_vector_view(vector.data_handle(), vector.extent(0));
auto matrix_view = raft::make_host_matrix_view(matrix.data_handle(), matrix.extent(0), matrix.extent(1));

Please refer to RAFT’s mdspan documentation to learn more.

CUDA array interface (Python)#