All-neighbors#
All-neighbors allows building an approximate all-neighbors knn graph. Given a full dataset, it finds nearest neighbors for all the training vectors in the dataset.
#include <cuvs/neighbors/all_neighbors.hpp>
namespace cuvs::neighbors::all_neighbors
All neighbors knn graph build parameters#
-
using GraphBuildParams = std::variant<graph_build_params::ivf_pq_params, graph_build_params::nn_descent_params>#
-
struct all_neighbors_params#
- #include <all_neighbors.hpp>
Parameters used to build an all-neighbors graph (find nearest neighbors for all the training vectors). For scalability, the all-neighbors graph construction algorithm partitions a set of training vectors into overlapping clusters, computes a local knn graph on each cluster, and merges the local graphs into a single global graph. Device memory usage and accuracy can be configured by changing the
overlap_factor
andn_clusters
. The algorithm used to build each local graph is also configurable.Public Members
-
GraphBuildParams graph_build_params#
Parameters for knn graph building algorithm Approximate nearest neighbors methods are used to build the knn graph. Currently supported options are ‘IVF-PQ’ and ‘NN Descent’. IVF-PQ is more accurate, but slower compared to NN Descent.
Set ivf_pq_params, or nn_descent_params to select the graph build algorithm and control their parameters.
all_neighbors::index_params params; // 1. Choose IVF-PQ algorithm params.graph_build_params = all_neighbors::graph_build_params::ivf_pq_params{}; // 2. Choose NN Descent algorithm for kNN graph construction params.graph_build_params = all_neighbors::graph_build_params::nn_descent_params{};
-
size_t overlap_factor = 2#
Number of nearest clusters each data point will be assigned to in the batching algorithm. Start with
overlap_factor = 2
and gradually increase (2->3->4 …) for better accuracy at the cost of device memory usage.
-
size_t n_clusters = 1#
Number of total clusters (aka batches) to split the data into. If set to 1, algorithm creates an all-neighbors graph without batching. Start with
n_clusters = 4
and increase (4 → 8 → 16…) for less device memory usage at the cost of accuracy. This is independent fromoverlap_factor
as long asoverlap_factor
<n_clusters
.The ratio of
overlap_factor / n_clusters
determines device memory usage. Approximately(overlap_factor / n_clusters) * num_rows_in_entire_data
number of rows will be put on device memory at once. E.g. between(overlap_factor / n_clusters)
= 2/10 and 2/20, the latter will use less device memory.Larger
overlap_factor
results in better accuracy of the final all-neighbors knn graph. E.g. While using similar device memory,(overlap_factor / n_clusters)
= 4/20 will have better accuracy than 2/10 at the cost of performance.
-
cuvs::distance::DistanceType metric = cuvs::distance::DistanceType::L2Expanded#
Metric used.
-
GraphBuildParams graph_build_params#
Build#
- void build(
- const raft::resources &handle,
- const all_neighbors_params ¶ms,
- raft::host_matrix_view<const float, int64_t, row_major> dataset,
- raft::device_matrix_view<int64_t, int64_t, row_major> indices,
- std::optional<raft::device_matrix_view<float, int64_t, row_major>> distances = std::nullopt
Builds an approximate all-neighbors knn graph (find nearest neighbors for all the training vectors)
Usage example:
using namespace cuvs::neighbors; // use default index parameters all_neighbors::all_neighbors_params params; auto indices = raft::make_device_matrix<int64_t, int64_t>(handle, n_row, k); auto distances = raft::make_device_matrix<float, int64_t>(handle, n_row, k); all_neighbors::build(res, params, dataset, indices.view(), distances.view());
- Parameters:
handle – [in] raft::resources is an object mangaging resources
params – [in] an instance of all_neighbors::all_neighbors_params that are parameters to build all-neighbors knn graph
dataset – [in] raft::host_matrix_view input dataset expected to be located in host memory
indices – [out] nearest neighbor indices of shape [n_row x k]
distances – [out] nearest neighbor distances [n_row x k]
- void build(
- const raft::resources &handle,
- const all_neighbors_params ¶ms,
- raft::device_matrix_view<const float, int64_t, row_major> dataset,
- raft::device_matrix_view<int64_t, int64_t, row_major> indices,
- std::optional<raft::device_matrix_view<float, int64_t, row_major>> distances = std::nullopt
Builds an approximate all-neighbors knn graph (find nearest neighbors for all the training vectors) params.n_clusters should be 1 for data on device. To use a larger params.n_clusters for efficient device memory usage, put data on host RAM.
Usage example:
using namespace cuvs::neighbors; // use default index parameters all_neighbors::all_neighbors_params params; auto indices = raft::make_device_matrix<int64_t, int64_t>(handle, n_row, k); auto distances = raft::make_device_matrix<float, int64_t>(handle, n_row, k); all_neighbors::build(res, params, dataset, indices.view(), distances.view());
- Parameters:
handle – [in] raft::resources is an object mangaging resources
params – [in] an instance of all_neighbors::all_neighbors_params that are parameters to build all-neighbors knn graph
dataset – [in] raft::device_matrix_view input dataset expected to be located in device memory
indices – [out] nearest neighbor indices of shape [n_row x k]
distances – [out] nearest neighbor distances [n_row x k]