Cluster#
Params#
#include <cuvs/cluster/agglomerative.hpp>
namespace cuvs::cluster::agglomerative
-
enum Linkage#
Determines the method for computing the minimum spanning tree (MST)
Values:
-
enumerator PAIRWISE#
Use a pairwise distance matrix as input to the mst. This is very fast and the best option for fairly small datasets (~50k data points)
-
enumerator KNN_GRAPH#
Construct a KNN graph as input to the mst and provide additional edges if the mst does not converge. This is slower but scales to very large datasets.
-
enumerator PAIRWISE#
Single-linkage#
include <cuvs/cluster/agglomerative.hpp>
namespace cuvs::cluster::agglomerative
-
void single_linkage(raft::resources const &handle, raft::device_matrix_view<const float, int, raft::row_major> X, raft::device_matrix_view<int, int, raft::row_major> dendrogram, raft::device_vector_view<int, int> labels, cuvs::distance::DistanceType metric, size_t n_clusters, cuvs::cluster::agglomerative::Linkage linkage = cuvs::cluster::agglomerative::Linkage::KNN_GRAPH, std::optional<int> c = std::make_optional<int>(DEFAULT_CONST_C))#
Single-linkage clustering, capable of constructing a KNN graph to scale the algorithm beyond the n^2 memory consumption of implementations that use the fully-connected graph of pairwise distances by connecting a knn graph when k is not large enough to connect it.
- Parameters:
handle – [in] raft handle
X – [in] dense input matrix in row-major layout
dendrogram – [out] output dendrogram (size [n_rows - 1] * 2)
labels – [out] output labels vector (size n_rows)
metric – [in] distance metrix to use when constructing connectivities graph
n_clusters – [in] number of clusters to assign data samples
linkage – [in] strategy for constructing the linkage. PAIRWISE uses more memory but can be faster for smaller datasets. KNN_GRAPH allows the memory usage to be controlled (using parameter c) at the expense of potentially additional minimum spanning tree iterations.
c – [in] a constant used when constructing linkage from knn graph. Allows the indirect control of k. The algorithm will set
k = log(n) + c