NN-Descent#
The NN-descent method is an ANN algorithm that directly approximates a k-nearest neighbors graph by randomly sampling points to compute distances and using neighbors of neighbors distances to reduce distance computations.
#include <cuvs/neighbors/nn_descent.hpp>
namespace cuvs::neighbors::nn_descent
Index build parameters#
-
struct index_params : public cuvs::neighbors::index_params#
- #include <nn_descent.hpp>
Parameters used to build an nn-descent index.
graph_degree
: For an input dataset of dimensions (N, D), determines the final dimensions of the all-neighbors knn graph which turns out to be of dimensions (N, graph_degree)intermediate_graph_degree
: Internally, nn-descent builds an all-neighbors knn graph of dimensions (N, intermediate_graph_degree) before selecting the finalgraph_degree
neighbors. It’s recommended thatintermediate_graph_degree
>= 1.5 * graph_degreemax_iterations
: The number of iterations that nn-descent will refine the graph for. More iterations produce a better quality graph at cost of performancetermination_threshold
: The delta at which nn-descent will terminate its iterations
Index#
-
template<typename IdxT>
struct index : public cuvs::neighbors::index# - #include <nn_descent.hpp>
nn-descent Build an nn-descent index The index contains an all-neighbors graph of the input dataset stored in host memory of dimensions (n_rows, n_cols)
Reference: Hui Wang, Wan-Lei Zhao, Xiangxiang Zeng, and Jianye Yang. 2021. Fast k-NN Graph Construction by GPU based NN-Descent. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM ‘21). Association for Computing Machinery, New York, NY, USA, 1929–1938. https://doi.org/10.1145/3459637.3482344
- Template Parameters:
IdxT – dtype to be used for constructing knn-graph
Index build#
-
size_t graph_degree = 64#
-
size_t intermediate_graph_degree = 128#
-
size_t max_iterations = 20#
-
float termination_threshold = 0.0001#
-
bool return_distances = true#
-
size_t n_clusters = 1#
-
index_params(size_t graph_degree = 64, cuvs::distance::DistanceType metric = cuvs::distance::DistanceType::L2Expanded)#
Construct NN descent parameters for a specific kNN graph degree.
- Parameters:
graph_degree – output graph degree
metric – distance metric to use
-
inline index(raft::resources const &res, int64_t n_rows, int64_t n_cols, bool return_distances = false, cuvs::distance::DistanceType metric = cuvs::distance::DistanceType::L2Expanded)#
Construct a new index object.
This constructor creates an nn-descent index which is a knn-graph in host memory. The type of the knn-graph is a dense raft::host_matrix and dimensions are (n_rows, n_cols).
- Parameters:
res – raft::resources is an object mangaging resources
n_rows – number of rows in knn-graph
n_cols – number of cols in knn-graph
return_distances – whether to return distances
metric – distance metric to use
-
inline index(raft::resources const &res, raft::host_matrix_view<IdxT, int64_t, raft::row_major> graph_view, std::optional<raft::device_matrix_view<float, int64_t, row_major>> distances_view = std::nullopt, cuvs::distance::DistanceType metric = cuvs::distance::DistanceType::L2Expanded)#
Construct a new index object.
This constructor creates an nn-descent index using a user allocated host memory knn-graph. The type of the knn-graph is a dense raft::host_matrix and dimensions are (n_rows, n_cols).
- Parameters:
res – raft::resources is an object mangaging resources
graph_view – raft::host_matrix_view<IdxT, int64_t, raft::row_major> for storing knn-graph
distances_view – optional raft::device_matrix_view<float, int64_t, row_major> for storing distances
metric – distance metric to use
-
inline cuvs::distance::DistanceType metric() const noexcept#
Distance metric used for clustering.
-
inline IdxT size() const noexcept#
Total length of the index (number of vectors).
-
inline uint32_t graph_degree() const noexcept
Graph degree
-
inline raft::host_matrix_view<IdxT, int64_t, raft::row_major> graph() noexcept#
neighborhood graph [size, graph-degree]
-
inline std::optional<device_matrix_view<float, int64_t, row_major>> distances() noexcept#
neighborhood graph distances [size, graph-degree]
- index &=delete operator= (const index &)
- index &=default operator= (index &&)
-
~index() = default#
-
cuvs::neighbors::nn_descent::index<uint32_t> build(raft::resources const &res, index_params const ¶ms, raft::device_matrix_view<const float, int64_t, raft::row_major> dataset, std::optional<raft::host_matrix_view<uint32_t, int64_t, raft::row_major>> graph = std::nullopt)#
Build nn-descent Index with dataset in device memory.
The following distance metrics are supported:
L2
Usage example:
using namespace cuvs::neighbors; // use default index parameters nn_descent::index_params index_params; // create and fill the index from a [N, D] raft::device_matrix_view dataset auto index = nn_descent::build(res, index_params, dataset); // index.graph() provides a raft::host_matrix_view of an // all-neighbors knn graph of dimensions [N, k] of the input // dataset
- Parameters:
res – [in] raft::resources is an object mangaging resources
params – [in] an instance of nn_descent::index_params that are parameters to run the nn-descent algorithm
dataset – [in] raft::device_matrix_view input dataset expected to be located in device memory
graph – [in] optional raft::host_matrix_view<uint32_t, int64_t, raft::row_major> for owning the output graph
- Returns:
index<IdxT> index containing all-neighbors knn graph in host memory
-
cuvs::neighbors::nn_descent::index<uint32_t> build(raft::resources const &res, index_params const ¶ms, raft::host_matrix_view<const float, int64_t, raft::row_major> dataset, std::optional<raft::host_matrix_view<uint32_t, int64_t, raft::row_major>> graph = std::nullopt)#
Build nn-descent Index with dataset in host memory.
The following distance metrics are supported:
L2
Usage example:
using namespace cuvs::neighbors::experimental; // use default index parameters nn_descent::index_params index_params; // create and fill the index from a [N, D] raft::host_matrix_view dataset auto index = cagra::build(res, index_params, dataset); // index.graph() provides a raft::host_matrix_view of an // all-neighbors knn graph of dimensions [N, k] of the input // dataset
- Template Parameters:
T – data-type of the input dataset
IdxT – data-type for the output index
- Parameters:
res – raft::resources is an object mangaging resources
params – [in] an instance of nn_descent::index_params that are parameters to run the nn-descent algorithm
dataset – [in] raft::host_matrix_view input dataset expected to be located in host memory
graph – [in] optional raft::host_matrix_view<uint32_t, int64_t, raft::row_major> for owning the output graph
- Returns:
index<IdxT> index containing all-neighbors knn graph in host memory
-
cuvs::neighbors::nn_descent::index<uint32_t> build(raft::resources const &res, index_params const ¶ms, raft::device_matrix_view<const half, int64_t, raft::row_major> dataset, std::optional<raft::host_matrix_view<uint32_t, int64_t, raft::row_major>> graph = std::nullopt)#
Build nn-descent Index with dataset in device memory.
The following distance metrics are supported:
L2
Usage example:
using namespace cuvs::neighbors; // use default index parameters nn_descent::index_params index_params; // create and fill the index from a [N, D] raft::device_matrix_view dataset auto index = nn_descent::build(res, index_params, dataset); // index.graph() provides a raft::host_matrix_view of an // all-neighbors knn graph of dimensions [N, k] of the input // dataset
- Parameters:
res – [in] raft::resources is an object mangaging resources
params – [in] an instance of nn_descent::index_params that are parameters to run the nn-descent algorithm
dataset – [in] raft::device_matrix_view input dataset expected to be located in device memory
graph – [in] optional raft::host_matrix_view<uint32_t, int64_t, raft::row_major> for owning the output graph
- Returns:
index<IdxT> index containing all-neighbors knn graph in host memory
-
cuvs::neighbors::nn_descent::index<uint32_t> build(raft::resources const &res, index_params const ¶ms, raft::host_matrix_view<const half, int64_t, raft::row_major> dataset, std::optional<raft::host_matrix_view<uint32_t, int64_t, raft::row_major>> graph = std::nullopt)#
Build nn-descent Index with dataset in host memory.
The following distance metrics are supported:
L2
Usage example:
using namespace cuvs::neighbors::experimental; // use default index parameters nn_descent::index_params index_params; // create and fill the index from a [N, D] raft::host_matrix_view dataset auto index = cagra::build(res, index_params, dataset); // index.graph() provides a raft::host_matrix_view of an // all-neighbors knn graph of dimensions [N, k] of the input // dataset
- Template Parameters:
T – data-type of the input dataset
IdxT – data-type for the output index
- Parameters:
res – raft::resources is an object mangaging resources
params – [in] an instance of nn_descent::index_params that are parameters to run the nn-descent algorithm
dataset – [in] raft::host_matrix_view input dataset expected to be located in host memory
graph – [in] optional raft::host_matrix_view<uint32_t, int64_t, raft::row_major> for owning the output graph
- Returns:
index<IdxT> index containing all-neighbors knn graph in host memory
-
cuvs::neighbors::nn_descent::index<uint32_t> build(raft::resources const &res, index_params const ¶ms, raft::device_matrix_view<const int8_t, int64_t, raft::row_major> dataset, std::optional<raft::host_matrix_view<uint32_t, int64_t, raft::row_major>> graph = std::nullopt)#
Build nn-descent Index with dataset in device memory.
The following distance metrics are supported:
L2
Usage example:
using namespace cuvs::neighbors; // use default index parameters nn_descent::index_params index_params; // create and fill the index from a [N, D] raft::device_matrix_view dataset auto index = nn_descent::build(res, index_params, dataset); // index.graph() provides a raft::host_matrix_view of an // all-neighbors knn graph of dimensions [N, k] of the input // dataset
- Parameters:
res – [in] raft::resources is an object mangaging resources
params – [in] an instance of nn_descent::index_params that are parameters to run the nn-descent algorithm
dataset – [in] raft::device_matrix_view input dataset expected to be located in device memory
graph – [in] optional raft::host_matrix_view<uint32_t, int64_t, raft::row_major> for owning the output graph
- Returns:
index<IdxT> index containing all-neighbors knn graph in host memory
-
cuvs::neighbors::nn_descent::index<uint32_t> build(raft::resources const &res, index_params const ¶ms, raft::host_matrix_view<const int8_t, int64_t, raft::row_major> dataset, std::optional<raft::host_matrix_view<uint32_t, int64_t, raft::row_major>> graph = std::nullopt)#
Build nn-descent Index with dataset in host memory.
The following distance metrics are supported:
L2
Usage example:
using namespace cuvs::neighbors::experimental; // use default index parameters nn_descent::index_params index_params; // create and fill the index from a [N, D] raft::host_matrix_view dataset auto index = cagra::build(res, index_params, dataset); // index.graph() provides a raft::host_matrix_view of an // all-neighbors knn graph of dimensions [N, k] of the input // dataset
- Template Parameters:
T – data-type of the input dataset
IdxT – data-type for the output index
- Parameters:
res – raft::resources is an object mangaging resources
params – [in] an instance of nn_descent::index_params that are parameters to run the nn-descent algorithm
dataset – [in] raft::host_matrix_view input dataset expected to be located in host memory
graph – [in] optional raft::host_matrix_view<uint32_t, int64_t, raft::row_major> for owning the output graph
- Returns:
index<IdxT> index containing all-neighbors knn graph in host memory
-
cuvs::neighbors::nn_descent::index<uint32_t> build(raft::resources const &res, index_params const ¶ms, raft::device_matrix_view<const uint8_t, int64_t, raft::row_major> dataset, std::optional<raft::host_matrix_view<uint32_t, int64_t, raft::row_major>> graph = std::nullopt)#
Build nn-descent Index with dataset in device memory.
The following distance metrics are supported:
L2
Usage example:
using namespace cuvs::neighbors; // use default index parameters nn_descent::index_params index_params; // create and fill the index from a [N, D] raft::device_matrix_view dataset auto index = nn_descent::build(res, index_params, dataset); // index.graph() provides a raft::host_matrix_view of an // all-neighbors knn graph of dimensions [N, k] of the input // dataset
- Parameters:
res – [in] raft::resources is an object mangaging resources
params – [in] an instance of nn_descent::index_params that are parameters to run the nn-descent algorithm
dataset – [in] raft::device_matrix_view input dataset expected to be located in device memory
graph – [in] optional raft::host_matrix_view<uint32_t, int64_t, raft::row_major> for owning the output graph
- Returns:
index<IdxT> index containing all-neighbors knn graph in host memory
-
cuvs::neighbors::nn_descent::index<uint32_t> build(raft::resources const &res, index_params const ¶ms, raft::host_matrix_view<const uint8_t, int64_t, raft::row_major> dataset, std::optional<raft::host_matrix_view<uint32_t, int64_t, raft::row_major>> graph = std::nullopt)#
Build nn-descent Index with dataset in host memory.
The following distance metrics are supported:
L2
Usage example:
using namespace cuvs::neighbors::experimental; // use default index parameters nn_descent::index_params index_params; // create and fill the index from a [N, D] raft::host_matrix_view dataset auto index = cagra::build(res, index_params, dataset); // index.graph() provides a raft::host_matrix_view of an // all-neighbors knn graph of dimensions [N, k] of the input // dataset
- Template Parameters:
T – data-type of the input dataset
IdxT – data-type for the output index
- Parameters:
res – raft::resources is an object mangaging resources
params – [in] an instance of nn_descent::index_params that are parameters to run the nn-descent algorithm
dataset – [in] raft::host_matrix_view input dataset expected to be located in host memory
graph – [in] optional raft::host_matrix_view<uint32_t, int64_t, raft::row_major> for owning the output graph
- Returns:
index<IdxT> index containing all-neighbors knn graph in host memory
-
bool has_enough_device_memory(raft::resources const &res, raft::matrix_extent<int64_t> dataset, size_t idx_size = 4)#
Test if we have enough GPU memory to run NN descent algorithm.
- Parameters:
res –
dataset – shape of the dataset
idx_size – the size of index type in bytes
- Returns:
true if enough GPU memory can be allocated
- Returns:
false otherwise