Attention

The vector search and clustering algorithms in RAFT are being migrated to a new library dedicated to vector search called cuVS. We will continue to support the vector search algorithms in RAFT during this move, but will no longer update them after the RAPIDS 24.06 (June) release. We plan to complete the migration by RAPIDS 24.08 (August) release.

Sparse Neighbors#

template<typename value_idx, typename value_t>
using raft::sparse::neighbors::FixConnectivitiesRedOp = detail::FixConnectivitiesRedOp<value_idx, value_t>#
template<typename value_idx>
value_idx raft::sparse::neighbors::get_n_components(value_idx *colors, size_t n_rows, cudaStream_t stream)#

Gets the number of unique components from array of colors or labels. This does not assume the components are drawn from a monotonically increasing set.

Template Parameters:

value_idx

Parameters:
  • colors[in] array of components

  • n_rows[in] size of components array

  • stream[in] cuda stream for which to order cuda operations

Returns:

total number of components

template<typename value_idx, typename value_t, typename red_op>
void raft::sparse::neighbors::cross_component_nn(raft::resources const &handle, raft::sparse::COO<value_t, value_idx> &out, const value_t *X, const value_idx *orig_colors, size_t n_rows, size_t n_cols, red_op reduction_op, size_t row_batch_size = 0, size_t col_batch_size = 0, raft::distance::DistanceType metric = raft::distance::DistanceType::L2SqrtExpanded)#

Connects the components of an otherwise unconnected knn graph by computing a 1-nn to neighboring components of each data point (e.g. component(nn) != component(self)) and reducing the results to include the set of smallest destination components for each source component. The result will not necessarily contain n_components^2 - n_components number of elements because many components will likely not be contained in the neighborhoods of 1-nns.

Template Parameters:
  • value_idx

  • value_t

Parameters:
  • handle[in] raft handle

  • out[out] output edge list containing nearest cross-component edges.

  • X[in] original (row-major) dense matrix for which knn graph should be constructed.

  • orig_colors[in] array containing component number for each row of X

  • n_rows[in] number of rows in X

  • n_cols[in] number of cols in X

  • reduction_op[in] reduction operation for computing nearest neighbors. The reduction operation must have gather and scatter functions defined

  • row_batch_size[in] the batch size for computing nearest neighbors. This parameter controls the number of samples for which the nearest neighbors are computed at once. Therefore, it affects the memory consumption mainly by reducing the size of the adjacency matrix for masked nearest neighbors computation

  • col_batch_size[in] the input data is sorted and ‘unsorted’ based on color. An additional scratch space buffer of shape (n_rows, col_batch_size) is created for this. Usually, this parameter affects the memory consumption more drastically than the row_batch_size with a marginal increase in compute time as the col_batch_size is reduced

  • metric[in] distance metric

template<typename value_idx = int, typename value_t = float, int TPB_X = 32>
void raft::sparse::neighbors::brute_force_knn(const value_idx *idxIndptr, const value_idx *idxIndices, const value_t *idxData, size_t idxNNZ, int n_idx_rows, int n_idx_cols, const value_idx *queryIndptr, const value_idx *queryIndices, const value_t *queryData, size_t queryNNZ, int n_query_rows, int n_query_cols, value_idx *output_indices, value_t *output_dists, int k, raft::resources const &handle, size_t batch_size_index = 2 << 14, size_t batch_size_query = 2 << 14, raft::distance::DistanceType metric = raft::distance::DistanceType::L2Expanded, float metricArg = 0)#

Search the sparse kNN for the k-nearest neighbors of a set of sparse query vectors using some distance implementation

Parameters:
  • idxIndptr[in] csr indptr of the index matrix (size n_idx_rows + 1)

  • idxIndices[in] csr column indices array of the index matrix (size n_idx_nnz)

  • idxData[in] csr data array of the index matrix (size idxNNZ)

  • idxNNZ[in] number of non-zeros for sparse index matrix

  • n_idx_rows[in] number of data samples in index matrix

  • n_idx_cols[in]

  • queryIndptr[in] csr indptr of the query matrix (size n_query_rows + 1)

  • queryIndices[in] csr indices array of the query matrix (size queryNNZ)

  • queryData[in] csr data array of the query matrix (size queryNNZ)

  • queryNNZ[in] number of non-zeros for sparse query matrix

  • n_query_rows[in] number of data samples in query matrix

  • n_query_cols[in] number of features in query matrix

  • output_indices[out] dense matrix for output indices (size n_query_rows * k)

  • output_dists[out] dense matrix for output distances (size n_query_rows * k)

  • k[in] the number of neighbors to query

  • handle[in] CUDA resource::get_cuda_stream(handle) to order operations with respect to

  • batch_size_index[in] maximum number of rows to use from index matrix per batch

  • batch_size_query[in] maximum number of rows to use from query matrix per batch

  • metric[in] distance metric/measure to use

  • metricArg[in] potential argument for metric (currently unused)

template<typename value_idx = int, typename value_t = float>
void raft::sparse::neighbors::knn_graph(raft::resources const &handle, const value_t *X, std::size_t m, std::size_t n, raft::distance::DistanceType metric, raft::sparse::COO<value_t, value_idx> &out, int c = 15)#

Constructs a (symmetrized) knn graph edge list from dense input vectors.

Note: The resulting KNN graph is not guaranteed to be connected.

Template Parameters:
  • value_idx

  • value_t

Parameters:
  • handle[in] raft handle

  • X[in] dense matrix of input data samples and observations

  • m[in] number of data samples (rows) in X

  • n[in] number of observations (columns) in X

  • metric[in] distance metric to use when constructing neighborhoods

  • out[out] output edge list

  • c