Cluster#

Params#

#include <cuvs/cluster/agglomerative.hpp>

namespace cuvs::cluster::agglomerative

enum Linkage#

Determines the method for computing the minimum spanning tree (MST)

Values:

enumerator PAIRWISE#

Use a pairwise distance matrix as input to the mst. This is very fast and the best option for fairly small datasets (~50k data points)

enumerator KNN_GRAPH#

Construct a KNN graph as input to the mst and provide additional edges if the mst does not converge. This is slower but scales to very large datasets.

Single-linkage#

include <cuvs/cluster/agglomerative.hpp>

namespace cuvs::cluster::agglomerative

void single_linkage(
raft::resources const &handle,
raft::device_matrix_view<const float, int, raft::row_major> X,
raft::device_matrix_view<int, int, raft::row_major> dendrogram,
raft::device_vector_view<int, int> labels,
cuvs::distance::DistanceType metric,
size_t n_clusters,
cuvs::cluster::agglomerative::Linkage linkage = cuvs::cluster::agglomerative::Linkage::KNN_GRAPH,
std::optional<int> c = std::make_optional<int>(DEFAULT_CONST_C),
)#

Single-linkage clustering, capable of constructing a KNN graph to scale the algorithm beyond the n^2 memory consumption of implementations that use the fully-connected graph of pairwise distances by connecting a knn graph when k is not large enough to connect it.

Parameters:
  • handle[in] raft handle

  • X[in] dense input matrix in row-major layout

  • dendrogram[out] output dendrogram (size [n_rows - 1] * 2)

  • labels[out] output labels vector (size n_rows)

  • metric[in] distance metrix to use when constructing connectivities graph

  • n_clusters[in] number of clusters to assign data samples

  • linkage[in] strategy for constructing the linkage. PAIRWISE uses more memory but can be faster for smaller datasets. KNN_GRAPH allows the memory usage to be controlled (using parameter c) at the expense of potentially additional minimum spanning tree iterations.

  • c[in] a constant used when constructing linkage from knn graph. Allows the indirect control of k. The algorithm will set k = log(n) + c