K-Means#
Parameters#
#include <cuvs/cluster/kmeans.hpp>
namespace cuvs::cluster::kmeans
-
enum class kmeans_type#
Type of k-means algorithm.
Values:
-
enumerator KMeans#
-
enumerator KMeansBalanced#
-
enumerator KMeans#
-
struct params : public cuvs::cluster::kmeans::base_params#
- #include <kmeans.hpp>
Simple object to specify hyper-parameters to the kmeans algorithm.
Public Members
-
int n_clusters = 8#
The number of clusters to form as well as the number of centroids to generate (default:8).
-
InitMethod init = KMeansPlusPlus#
Method for initialization, defaults to k-means++:
InitMethod::KMeansPlusPlus (k-means++): Use scalable k-means++ algorithm to select the initial cluster centers.
InitMethod::Random (random): Choose ‘n_clusters’ observations (rows) at random from the input data for the initial centroids.
InitMethod::Array (ndarray): Use ‘centroids’ as initial cluster centers.
-
int max_iter = 300#
Maximum number of iterations of the k-means algorithm for a single run.
-
double tol = 1e-4#
Relative tolerance with regards to inertia to declare convergence.
-
rapids_logger::level_enum verbosity = rapids_logger::level_enum::info#
verbosity level.
-
raft::random::RngState rng_state = {0}#
Seed to the random number generator.
-
int n_init = 1#
Number of instance k-means algorithm will be run with different seeds.
-
double oversampling_factor = 2.0#
Oversampling factor for use in the k-means|| algorithm
-
int batch_samples = 1 << 15#
batch_samples and batch_centroids are used to tile 1NN computation which is useful to optimize/control the memory footprint Default tile is [batch_samples x n_clusters] i.e. when batch_centroids is 0 then don’t tile the centroids
NB: These parameters are unrelated to streaming_batch_size, which controls how many samples to transfer from host to device per batch when processing out-of-core data.
-
int batch_centroids = 0#
if 0 then batch_centroids = n_clusters
-
bool inertia_check = false#
If true, check inertia during iterations for early convergence.
-
int64_t streaming_batch_size = 0#
Number of samples to process per GPU batch when fitting with host data. When set to 0, defaults to n_samples (process all at once). Only used by the batched (host-data) code path and ignored by device-data overloads. Default: 0 (process all data at once).
-
int n_clusters = 8#
-
struct balanced_params : public cuvs::cluster::kmeans::base_params#
- #include <kmeans.hpp>
Simple object to specify hyper-parameters to the balanced k-means algorithm.
The following metrics are currently supported in k-means balanced:
CosineExpanded
InnerProduct
L2Expanded
L2SqrtExpanded
Public Members
-
uint32_t n_iters = 20#
Number of training iterations
K-means#
#include <cuvs/cluster/kmeans.hpp>
namespace cuvs::cluster::kmeans
- void fit(
- raft::resources const &handle,
- const cuvs::cluster::kmeans::params ¶ms,
- raft::host_matrix_view<const float, int64_t> X,
- std::optional<raft::host_vector_view<const float, int64_t>> sample_weight,
- raft::device_matrix_view<float, int64_t> centroids,
- raft::host_scalar_view<float> inertia,
- raft::host_scalar_view<int64_t> n_iter
Find clusters with k-means algorithm using batched processing of host data.
TODO: Evaluate replacing the extent type with int64_t. Reference issue: https://github.com/rapidsai/cuvs/issues/1961
This overload supports out-of-core computation where the dataset resides on the host. Data is processed in GPU-sized batches, streaming from host to device. The batch size is controlled by params.streaming_batch_size.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::params params; params.n_clusters = 100; params.streaming_batch_size = 100000; float inertia; int64_t n_iter; // Data on host std::vector<float> h_X(n_samples * n_features); auto X = raft::make_host_matrix_view<const float, int64_t>(h_X.data(), n_samples, n_features); // Centroids on device auto centroids = raft::make_device_matrix<float, int64_t>(handle, params.n_clusters, n_features); kmeans::fit(handle, params, X, std::nullopt, centroids.view(), raft::make_host_scalar_view(&inertia), raft::make_host_scalar_view(&n_iter));
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model. Batch size is read from params.streaming_batch_size.
X – [in] Training instances on HOST memory. The data must be in row-major format. [dim = n_samples x n_features]
sample_weight – [in] Optional weights for each observation in X (on host). [len = n_samples]
centroids – [inout] [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
inertia – [out] Sum of squared distances of samples to their closest cluster center.
n_iter – [out] Number of iterations run.
- void fit(
- raft::resources const &handle,
- const cuvs::cluster::kmeans::params ¶ms,
- raft::host_matrix_view<const double, int64_t> X,
- std::optional<raft::host_vector_view<const double, int64_t>> sample_weight,
- raft::device_matrix_view<double, int64_t> centroids,
- raft::host_scalar_view<double> inertia,
- raft::host_scalar_view<int64_t> n_iter
Find clusters with k-means algorithm using batched processing of host data.
- void fit(
- raft::resources const &handle,
- const cuvs::cluster::kmeans::params ¶ms,
- raft::device_matrix_view<const float, int> X,
- std::optional<raft::device_vector_view<const float, int>> sample_weight,
- raft::device_matrix_view<float, int> centroids,
- raft::host_scalar_view<float> inertia,
- raft::host_scalar_view<int> n_iter
Find clusters with k-means algorithm. Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::params params; int n_features = 15, inertia, n_iter; auto centroids = raft::make_device_matrix<float, int>(handle, params.n_clusters, n_features); kmeans::fit(handle, params, X, std::nullopt, centroids, raft::make_scalar_view(&inertia), raft::make_scalar_view(&n_iter));
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
sample_weight – [in] Optional weights for each observation in X. [len = n_samples]
centroids – [inout] [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
inertia – [out] Sum of squared distances of samples to their closest cluster center.
n_iter – [out] Number of iterations run.
- void fit(
- raft::resources const &handle,
- const cuvs::cluster::kmeans::params ¶ms,
- raft::device_matrix_view<const float, int64_t> X,
- std::optional<raft::device_vector_view<const float, int64_t>> sample_weight,
- raft::device_matrix_view<float, int64_t> centroids,
- raft::host_scalar_view<float> inertia,
- raft::host_scalar_view<int64_t> n_iter
Find clusters with k-means algorithm. Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::params params; int64_t n_features = 15, inertia, n_iter; auto centroids = raft::make_device_matrix<float, int64_t>(handle, params.n_clusters, n_features); kmeans::fit(handle, params, X, std::nullopt, centroids, raft::make_scalar_view(&inertia), raft::make_scalar_view(&n_iter));
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
sample_weight – [in] Optional weights for each observation in X. [len = n_samples]
centroids – [inout] [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
inertia – [out] Sum of squared distances of samples to their closest cluster center.
n_iter – [out] Number of iterations run.
- void fit(
- raft::resources const &handle,
- const cuvs::cluster::kmeans::params ¶ms,
- raft::device_matrix_view<const double, int> X,
- std::optional<raft::device_vector_view<const double, int>> sample_weight,
- raft::device_matrix_view<double, int> centroids,
- raft::host_scalar_view<double> inertia,
- raft::host_scalar_view<int> n_iter
Find clusters with k-means algorithm. Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::params params; int n_features = 15, inertia, n_iter; auto centroids = raft::make_device_matrix<double, int>(handle, params.n_clusters, n_features); kmeans::fit(handle, params, X, std::nullopt, centroids, raft::make_scalar_view(&inertia), raft::make_scalar_view(&n_iter));
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
sample_weight – [in] Optional weights for each observation in X. [len = n_samples]
centroids – [inout] [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
inertia – [out] Sum of squared distances of samples to their closest cluster center.
n_iter – [out] Number of iterations run.
- void fit(
- raft::resources const &handle,
- const cuvs::cluster::kmeans::params ¶ms,
- raft::device_matrix_view<const double, int64_t> X,
- std::optional<raft::device_vector_view<const double, int64_t>> sample_weight,
- raft::device_matrix_view<double, int64_t> centroids,
- raft::host_scalar_view<double> inertia,
- raft::host_scalar_view<int64_t> n_iter
Find clusters with k-means algorithm. Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::params params; int64_t n_features = 15, inertia, n_iter; auto centroids = raft::make_device_matrix<double, int64_t>(handle, params.n_clusters, n_features); kmeans::fit(handle, params, X, std::nullopt, centroids, raft::make_scalar_view(&inertia), raft::make_scalar_view(&n_iter));
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
sample_weight – [in] Optional weights for each observation in X. [len = n_samples]
centroids – [inout] [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
inertia – [out] Sum of squared distances of samples to their closest cluster center.
n_iter – [out] Number of iterations run.
- void fit(
- raft::resources const &handle,
- const cuvs::cluster::kmeans::params ¶ms,
- raft::device_matrix_view<const int8_t, int> X,
- std::optional<raft::device_vector_view<const int8_t, int>> sample_weight,
- raft::device_matrix_view<int8_t, int> centroids,
- raft::host_scalar_view<int8_t> inertia,
- raft::host_scalar_view<int> n_iter
Find clusters with k-means algorithm. Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::params params; int n_features = 15, inertia, n_iter; auto centroids = raft::make_device_matrix<float, int>(handle, params.n_clusters, n_features); kmeans::fit(handle, params, X, std::nullopt, centroids, raft::make_scalar_view(&inertia), raft::make_scalar_view(&n_iter));
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
sample_weight – [in] Optional weights for each observation in X. [len = n_samples]
centroids – [inout] [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
inertia – [out] Sum of squared distances of samples to their closest cluster center.
n_iter – [out] Number of iterations run.
- void fit(
- const raft::resources &handle,
- cuvs::cluster::kmeans::balanced_params const ¶ms,
- raft::device_matrix_view<const float, int64_t> X,
- raft::device_matrix_view<float, int64_t> centroids,
- std::optional<raft::host_scalar_view<float>> inertia = std::nullopt
Find balanced clusters with k-means algorithm.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::balanced_params params; int64_t n_features = 15; int64_t n_clusters = 8; auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features); kmeans::fit(handle, params, X, centroids);
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
centroids – [out] [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
inertia – [out] Sum of squared distances of samples to their closest cluster center.
- void fit(
- const raft::resources &handle,
- cuvs::cluster::kmeans::balanced_params const ¶ms,
- raft::device_matrix_view<const int8_t, int64_t> X,
- raft::device_matrix_view<float, int64_t> centroids,
- std::optional<raft::host_scalar_view<float>> inertia = std::nullopt
Find balanced clusters with k-means algorithm.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::balanced_params params; int64_t n_features = 15, n_clusters = 8; auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features); kmeans::fit(handle, params, X, centroids);
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
centroids – [inout] [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
inertia – [out] Sum of squared distances of samples to their closest cluster center.
- void fit(
- const raft::resources &handle,
- cuvs::cluster::kmeans::balanced_params const ¶ms,
- raft::device_matrix_view<const half, int64_t> X,
- raft::device_matrix_view<float, int64_t> centroids,
- std::optional<raft::host_scalar_view<float>> inertia = std::nullopt
Find balanced clusters with k-means algorithm.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::balanced_params params; int64_t n_features = 15, n_clusters = 8; auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features); kmeans::fit(handle, params, X, centroids);
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
centroids – [inout] [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
inertia – [out] Sum of squared distances of samples to their closest cluster center.
- void fit(
- const raft::resources &handle,
- cuvs::cluster::kmeans::balanced_params const ¶ms,
- raft::device_matrix_view<const uint8_t, int64_t> X,
- raft::device_matrix_view<float, int64_t> centroids,
- std::optional<raft::host_scalar_view<float>> inertia = std::nullopt
Find balanced clusters with k-means algorithm.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::balanced_params params; int64_t n_features = 15, n_clusters = 8; auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features); kmeans::fit(handle, params, X, centroids);
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
centroids – [inout] [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
inertia – [out] Sum of squared distances of samples to their closest cluster center.
- void predict(
- raft::resources const &handle,
- const kmeans::params ¶ms,
- raft::device_matrix_view<const float, int> X,
- std::optional<raft::device_vector_view<const float, int>> sample_weight,
- raft::device_matrix_view<const float, int> centroids,
- raft::device_vector_view<int, int> labels,
- bool normalize_weight,
- raft::host_scalar_view<float> inertia
Predict the closest cluster each sample in X belongs to.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::params params; int n_features = 15, inertia, n_iter; auto centroids = raft::make_device_matrix<float, int>(handle, params.n_clusters, n_features); kmeans::fit(handle, params, X, std::nullopt, centroids.view(), raft::make_scalar_view(&inertia), raft::make_scalar_view(&n_iter)); ... auto labels = raft::make_device_vector<int, int>(handle, X.extent(0)); kmeans::predict(handle, params, X, std::nullopt, centroids.view(), false, labels.view(), raft::make_scalar_view(&inertia));
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] New data to predict. [dim = n_samples x n_features]
sample_weight – [in] Optional weights for each observation in X. [len = n_samples]
centroids – [in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
normalize_weight – [in] True if the weights should be normalized
labels – [out] Index of the cluster each sample in X belongs to. [len = n_samples]
inertia – [out] Sum of squared distances of samples to their closest cluster center.
- void predict(
- raft::resources const &handle,
- const kmeans::params ¶ms,
- raft::device_matrix_view<const float, int64_t> X,
- std::optional<raft::device_vector_view<const float, int64_t>> sample_weight,
- raft::device_matrix_view<const float, int64_t> centroids,
- raft::device_vector_view<int64_t, int64_t> labels,
- bool normalize_weight,
- raft::host_scalar_view<float> inertia
Predict the closest cluster each sample in X belongs to.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::params params; int n_features = 15, inertia, n_iter; auto centroids = raft::make_device_matrix<float, int>(handle, params.n_clusters, n_features); kmeans::fit(handle, params, X, std::nullopt, centroids.view(), raft::make_scalar_view(&inertia), raft::make_scalar_view(&n_iter)); ... auto labels = raft::make_device_vector<int64_t, int>(handle, X.extent(0)); kmeans::predict(handle, params, X, std::nullopt, centroids.view(), false, labels.view(), raft::make_scalar_view(&inertia));
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] New data to predict. [dim = n_samples x n_features]
sample_weight – [in] Optional weights for each observation in X. [len = n_samples]
centroids – [in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
normalize_weight – [in] True if the weights should be normalized
labels – [out] Index of the cluster each sample in X belongs to. [len = n_samples]
inertia – [out] Sum of squared distances of samples to their closest cluster center.
- void predict(
- raft::resources const &handle,
- const kmeans::params ¶ms,
- raft::device_matrix_view<const double, int> X,
- std::optional<raft::device_vector_view<const double, int>> sample_weight,
- raft::device_matrix_view<const double, int> centroids,
- raft::device_vector_view<int, int> labels,
- bool normalize_weight,
- raft::host_scalar_view<double> inertia
Predict the closest cluster each sample in X belongs to.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::params params; int n_features = 15, inertia, n_iter; auto centroids = raft::make_device_matrix<double, int>(handle, params.n_clusters, n_features); kmeans::fit(handle, params, X, std::nullopt, centroids.view(), raft::make_scalar_view(&inertia), raft::make_scalar_view(&n_iter)); ... auto labels = raft::make_device_vector<int, int>(handle, X.extent(0)); kmeans::predict(handle, params, X, std::nullopt, centroids.view(), false, labels.view(), raft::make_scalar_view(&inertia));
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] New data to predict. [dim = n_samples x n_features]
sample_weight – [in] Optional weights for each observation in X. [len = n_samples]
centroids – [in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
normalize_weight – [in] True if the weights should be normalized
labels – [out] Index of the cluster each sample in X belongs to. [len = n_samples]
inertia – [out] Sum of squared distances of samples to their closest cluster center.
- void predict(
- raft::resources const &handle,
- const kmeans::params ¶ms,
- raft::device_matrix_view<const double, int64_t> X,
- std::optional<raft::device_vector_view<const double, int64_t>> sample_weight,
- raft::device_matrix_view<const double, int64_t> centroids,
- raft::device_vector_view<int64_t, int64_t> labels,
- bool normalize_weight,
- raft::host_scalar_view<double> inertia
Predict the closest cluster each sample in X belongs to.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::params params; int n_features = 15, inertia, n_iter; auto centroids = raft::make_device_matrix<double, int>(handle, params.n_clusters, n_features); kmeans::fit(handle, params, X, std::nullopt, centroids.view(), raft::make_scalar_view(&inertia), raft::make_scalar_view(&n_iter)); ... auto labels = raft::make_device_vector<int64_t, int>(handle, X.extent(0)); kmeans::predict(handle, params, X, std::nullopt, centroids.view(), false, labels.view(), raft::make_scalar_view(&inertia));
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] New data to predict. [dim = n_samples x n_features]
sample_weight – [in] Optional weights for each observation in X. [len = n_samples]
centroids – [in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
normalize_weight – [in] True if the weights should be normalized
labels – [out] Index of the cluster each sample in X belongs to. [len = n_samples]
inertia – [out] Sum of squared distances of samples to their closest cluster center.
- void predict(
- const raft::resources &handle,
- cuvs::cluster::kmeans::balanced_params const ¶ms,
- raft::device_matrix_view<const int8_t, int64_t> X,
- raft::device_matrix_view<const float, int64_t> centroids,
- raft::device_vector_view<uint32_t, int64_t> labels
Predict the closest cluster each sample in X belongs to.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::balanced_params params; int64_t n_features = 15, n_clusters = 8; auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features); kmeans::fit(handle, params, X, centroids.view()); ... auto labels = raft::make_device_vector<uint32_t, int64_t>(handle, X.extent(0)); kmeans::predict(handle, params, X, centroids.view(), labels.view());
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] New data to predict. [dim = n_samples x n_features]
centroids – [in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
labels – [out] Index of the cluster each sample in X belongs to. [len = n_samples]
- void predict(
- const raft::resources &handle,
- cuvs::cluster::kmeans::balanced_params const ¶ms,
- raft::device_matrix_view<const int8_t, int64_t> X,
- raft::device_matrix_view<const float, int64_t> centroids,
- raft::device_vector_view<int, int64_t> labels
Predict the closest cluster each sample in X belongs to.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::balanced_params params; int64_t n_features = 15, n_clusters = 8; auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features); kmeans::fit(handle, params, X, centroids.view()); ... auto labels = raft::make_device_vector<int, int64_t>(handle, X.extent(0)); kmeans::predict(handle, params, X, centroids.view(), labels.view());
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] New data to predict. [dim = n_samples x n_features]
centroids – [in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
labels – [out] Index of the cluster each sample in X belongs to. [len = n_samples]
- void predict(
- const raft::resources &handle,
- cuvs::cluster::kmeans::balanced_params const ¶ms,
- raft::device_matrix_view<const float, int64_t> X,
- raft::device_matrix_view<const float, int64_t> centroids,
- raft::device_vector_view<int, int64_t> labels
Predict the closest cluster each sample in X belongs to.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::balanced_params params; int64_t n_features = 15, n_clusters = 8; auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features); kmeans::fit(handle, params, X, centroids.view()); ... auto labels = raft::make_device_vector<int, int64_t>(handle, X.extent(0)); kmeans::predict(handle, params, X, centroids.view(), labels.view());
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] New data to predict. [dim = n_samples x n_features]
centroids – [in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
labels – [out] Index of the cluster each sample in X belongs to. [len = n_samples]
- void predict(
- const raft::resources &handle,
- cuvs::cluster::kmeans::balanced_params const ¶ms,
- raft::device_matrix_view<const float, int64_t> X,
- raft::device_matrix_view<const float, int64_t> centroids,
- raft::device_vector_view<uint32_t, int64_t> labels
Predict the closest cluster each sample in X belongs to.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::balanced_params params; int64_t n_features = 15, n_clusters = 8; auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features); kmeans::fit(handle, params, X, centroids.view()); ... auto labels = raft::make_device_vector<uint32_t, int64_t>(handle, X.extent(0)); kmeans::predict(handle, params, X, centroids.view(), labels.view());
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] New data to predict. [dim = n_samples x n_features]
centroids – [in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
labels – [out] Index of the cluster each sample in X belongs to. [len = n_samples]
- void predict(
- const raft::resources &handle,
- cuvs::cluster::kmeans::balanced_params const ¶ms,
- raft::device_matrix_view<const half, int64_t> X,
- raft::device_matrix_view<const float, int64_t> centroids,
- raft::device_vector_view<uint32_t, int64_t> labels
Predict the closest cluster each sample in X belongs to.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::balanced_params params; int64_t n_features = 15, n_clusters = 8; auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features); kmeans::fit(handle, params, X, centroids.view()); ... auto labels = raft::make_device_vector<uint32_t, int64_t>(handle, X.extent(0)); kmeans::predict(handle, params, X, centroids.view(), labels.view());
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] New data to predict. [dim = n_samples x n_features]
centroids – [in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
labels – [out] Index of the cluster each sample in X belongs to. [len = n_samples]
- void predict(
- const raft::resources &handle,
- cuvs::cluster::kmeans::balanced_params const ¶ms,
- raft::device_matrix_view<const uint8_t, int64_t> X,
- raft::device_matrix_view<const float, int64_t> centroids,
- raft::device_vector_view<uint32_t, int64_t> labels
Predict the closest cluster each sample in X belongs to.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::balanced_params params; int64_t n_features = 15, n_clusters = 8; auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features); kmeans::fit(handle, params, X, centroids.view()); ... auto labels = raft::make_device_vector<uint32_t, int64_t>(handle, X.extent(0)); kmeans::predict(handle, params, X, centroids.view(), labels.view());
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] New data to predict. [dim = n_samples x n_features]
centroids – [in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
labels – [out] Index of the cluster each sample in X belongs to. [len = n_samples]
- void fit_predict(
- raft::resources const &handle,
- const kmeans::params ¶ms,
- raft::device_matrix_view<const float, int> X,
- std::optional<raft::device_vector_view<const float, int>> sample_weight,
- std::optional<raft::device_matrix_view<float, int>> centroids,
- raft::device_vector_view<int, int> labels,
- raft::host_scalar_view<float> inertia,
- raft::host_scalar_view<int> n_iter
Compute k-means clustering and predicts cluster index for each sample in the input.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::params params; int n_features = 15, inertia, n_iter; auto centroids = raft::make_device_matrix<float, int>(handle, params.n_clusters, n_features); auto labels = raft::make_device_vector<int, int>(handle, X.extent(0)); kmeans::fit_predict(handle, params, X, std::nullopt, centroids.view(), labels.view(), raft::make_scalar_view(&inertia), raft::make_scalar_view(&n_iter));
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
sample_weight – [in] Optional weights for each observation in X. [len = n_samples]
centroids – [inout] Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
labels – [out] Index of the cluster each sample in X belongs to. [len = n_samples]
inertia – [out] Sum of squared distances of samples to their closest cluster center.
n_iter – [out] Number of iterations run.
- void fit_predict(
- raft::resources const &handle,
- const kmeans::params ¶ms,
- raft::device_matrix_view<const float, int64_t> X,
- std::optional<raft::device_vector_view<const float, int64_t>> sample_weight,
- std::optional<raft::device_matrix_view<float, int64_t>> centroids,
- raft::device_vector_view<int64_t, int64_t> labels,
- raft::host_scalar_view<float> inertia,
- raft::host_scalar_view<int64_t> n_iter
Compute k-means clustering and predicts cluster index for each sample in the input.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::params params; int64_t n_features = 15, inertia, n_iter; auto centroids = raft::make_device_matrix<float, int64_t>(handle, params.n_clusters, n_features); auto labels = raft::make_device_vector<int64_t, int64_t>(handle, X.extent(0)); kmeans::fit_predict(handle, params, X, std::nullopt, centroids.view(), labels.view(), raft::make_scalar_view(&inertia), raft::make_scalar_view(&n_iter));
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
sample_weight – [in] Optional weights for each observation in X. [len = n_samples]
centroids – [inout] Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
labels – [out] Index of the cluster each sample in X belongs to. [len = n_samples]
inertia – [out] Sum of squared distances of samples to their closest cluster center.
n_iter – [out] Number of iterations run.
- void fit_predict(
- raft::resources const &handle,
- const kmeans::params ¶ms,
- raft::device_matrix_view<const double, int> X,
- std::optional<raft::device_vector_view<const double, int>> sample_weight,
- std::optional<raft::device_matrix_view<double, int>> centroids,
- raft::device_vector_view<int, int> labels,
- raft::host_scalar_view<double> inertia,
- raft::host_scalar_view<int> n_iter
Compute k-means clustering and predicts cluster index for each sample in the input.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::params params; int n_features = 15, inertia, n_iter; auto centroids = raft::make_device_matrix<double, int>(handle, params.n_clusters, n_features); auto labels = raft::make_device_vector<int, int>(handle, X.extent(0)); kmeans::fit_predict(handle, params, X, std::nullopt, centroids.view(), labels.view(), raft::make_scalar_view(&inertia), raft::make_scalar_view(&n_iter));
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
sample_weight – [in] Optional weights for each observation in X. [len = n_samples]
centroids – [inout] Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
labels – [out] Index of the cluster each sample in X belongs to. [len = n_samples]
inertia – [out] Sum of squared distances of samples to their closest cluster center.
n_iter – [out] Number of iterations run.
- void fit_predict(
- raft::resources const &handle,
- const kmeans::params ¶ms,
- raft::device_matrix_view<const double, int64_t> X,
- std::optional<raft::device_vector_view<const double, int64_t>> sample_weight,
- std::optional<raft::device_matrix_view<double, int64_t>> centroids,
- raft::device_vector_view<int64_t, int64_t> labels,
- raft::host_scalar_view<double> inertia,
- raft::host_scalar_view<int64_t> n_iter
Compute k-means clustering and predicts cluster index for each sample in the input.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::params params; int64_t n_features = 15, inertia, n_iter; auto centroids = raft::make_device_matrix<double, int64_t>(handle, params.n_clusters, n_features); auto labels = raft::make_device_vector<int64_t, int64_t>(handle, X.extent(0)); kmeans::fit_predict(handle, params, X, std::nullopt, centroids.view(), labels.view(), raft::make_scalar_view(&inertia), raft::make_scalar_view(&n_iter));
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
sample_weight – [in] Optional weights for each observation in X. [len = n_samples]
centroids – [inout] Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
labels – [out] Index of the cluster each sample in X belongs to. [len = n_samples]
inertia – [out] Sum of squared distances of samples to their closest cluster center.
n_iter – [out] Number of iterations run.
- void fit_predict(
- const raft::resources &handle,
- cuvs::cluster::kmeans::balanced_params const ¶ms,
- raft::device_matrix_view<const float, int64_t> X,
- raft::device_matrix_view<float, int64_t> centroids,
- raft::device_vector_view<uint32_t, int64_t> labels
Compute balanced k-means clustering and predicts cluster index for each sample in the input.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::balanced_params params; int64_t n_features = 15, n_clusters = 8; auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features); auto labels = raft::make_device_vector<int, int64_t>(handle, X.extent(0)); kmeans::fit_predict(handle, params, X, centroids.view(), labels.view());
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
centroids – [inout] Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
labels – [out] Index of the cluster each sample in X belongs to. [len = n_samples]
- void fit_predict(
- const raft::resources &handle,
- cuvs::cluster::kmeans::balanced_params const ¶ms,
- raft::device_matrix_view<const int8_t, int64_t> X,
- raft::device_matrix_view<float, int64_t> centroids,
- raft::device_vector_view<uint32_t, int64_t> labels
Compute balanced k-means clustering and predicts cluster index for each sample in the input.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> using namespace cuvs::cluster; ... raft::resources handle; cuvs::cluster::kmeans::balanced_params params; int64_t n_features = 15, n_clusters = 8; auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features); auto labels = raft::make_device_vector<int, int64_t>(handle, X.extent(0)); kmeans::fit_predict(handle, params, X, centroids.view(), labels.view());
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
centroids – [inout] Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
labels – [out] Index of the cluster each sample in X belongs to. [len = n_samples]
- void transform(
- raft::resources const &handle,
- const kmeans::params ¶ms,
- raft::device_matrix_view<const float, int> X,
- raft::device_matrix_view<const float, int> centroids,
- raft::device_matrix_view<float, int> X_new
Transform X to a cluster-distance space.
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] Training instances to cluster. The data must be in row-major format [dim = n_samples x n_features]
centroids – [in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
X_new – [out] X transformed in the new space. [dim = n_samples x n_features]
- void transform(
- raft::resources const &handle,
- const kmeans::params ¶ms,
- raft::device_matrix_view<const double, int> X,
- raft::device_matrix_view<const double, int> centroids,
- raft::device_matrix_view<double, int> X_new
Transform X to a cluster-distance space.
- Parameters:
handle – [in] The raft handle.
params – [in] Parameters for KMeans model.
X – [in] Training instances to cluster. The data must be in row-major format [dim = n_samples x n_features]
centroids – [in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
X_new – [out] X transformed in the new space. [dim = n_samples x n_features]
- void cluster_cost(
- const raft::resources &handle,
- raft::device_matrix_view<const float, int> X,
- raft::device_matrix_view<const float, int> centroids,
- raft::host_scalar_view<float> cost,
- std::optional<raft::device_vector_view<const float, int>> sample_weight = std::nullopt
Compute (optionally weighted) cluster cost.
- Parameters:
handle – [in] The raft handle
X – [in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
centroids – [in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
cost – [out] Resulting cluster cost
sample_weight – [in] Optional per-sample weights. [len = n_samples]
- void cluster_cost(
- const raft::resources &handle,
- raft::device_matrix_view<const double, int> X,
- raft::device_matrix_view<const double, int> centroids,
- raft::host_scalar_view<double> cost,
- std::optional<raft::device_vector_view<const double, int>> sample_weight = std::nullopt
Compute cluster cost.
- Parameters:
handle – [in] The raft handle
X – [in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
centroids – [in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
cost – [out] Resulting cluster cost
sample_weight – [in] Optional per-sample weights. [len = n_samples]
- void cluster_cost(
- const raft::resources &handle,
- raft::device_matrix_view<const float, int64_t> X,
- raft::device_matrix_view<const float, int64_t> centroids,
- raft::host_scalar_view<float> cost,
- std::optional<raft::device_vector_view<const float, int64_t>> sample_weight = std::nullopt
Compute (optionally weighted) cluster cost.
- Parameters:
handle – [in] The raft handle
X – [in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
centroids – [in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
cost – [out] Resulting cluster cost
sample_weight – [in] Optional per-sample weights. [len = n_samples]
- void cluster_cost(
- const raft::resources &handle,
- raft::device_matrix_view<const double, int64_t> X,
- raft::device_matrix_view<const double, int64_t> centroids,
- raft::host_scalar_view<double> cost,
- std::optional<raft::device_vector_view<const double, int64_t>> sample_weight = std::nullopt
Compute (optionally weighted) cluster cost.
- Parameters:
handle – [in] The raft handle
X – [in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
centroids – [in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
cost – [out] Resulting cluster cost
sample_weight – [in] Optional per-sample weights. [len = n_samples]
K-means Helpers#
#include <cuvs/cluster/kmeans.hpp>
namespace cuvs::cluster::kmeans::helpers
- void find_k(
- raft::resources const &handle,
- raft::device_matrix_view<const float, int> X,
- raft::host_scalar_view<int> best_k,
- raft::host_scalar_view<float> inertia,
- raft::host_scalar_view<int> n_iter,
- int kmax,
- int kmin = 1,
- int maxiter = 100,
- float tol = 1e-3
Automatically find the optimal value of k using a binary search. This method maximizes the Calinski-Harabasz Index while minimizing the per-cluster inertia.
#include <raft/core/resources.hpp> #include <cuvs/cluster/kmeans.hpp> #include <raft/random/make_blobs.cuh> using namespace cuvs::cluster; raft::handle_t handle; int n_samples = 100, n_features = 15, n_clusters = 10; auto X = raft::make_device_matrix<float, int>(handle, n_samples, n_features); auto labels = raft::make_device_vector<float, int>(handle, n_samples); raft::random::make_blobs(handle, X, labels, n_clusters); auto best_k = raft::make_host_scalar<int>(0); auto n_iter = raft::make_host_scalar<int>(0); auto inertia = raft::make_host_scalar<int>(0); kmeans::find_k(handle, X, best_k.view(), inertia.view(), n_iter.view(), n_clusters+1);
- Parameters:
handle – raft handle
X – input observations (shape n_samples, n_dims)
best_k – best k found from binary search
inertia – inertia of best k found
n_iter – number of iterations used to find best k
kmax – maximum k to try in search
kmin – minimum k to try in search (should be >= 1)
maxiter – maximum number of iterations to run
tol – tolerance for early stopping convergence