K-Means#

Parameters#

#include <cuvs/cluster/kmeans.hpp>

namespace cuvs::cluster::kmeans

enum class kmeans_type#

Type of k-means algorithm.

Values:

enumerator KMeans#
enumerator KMeansBalanced#
struct params : public cuvs::cluster::kmeans::base_params#
#include <kmeans.hpp>

Simple object to specify hyper-parameters to the kmeans algorithm.

Public Members

int n_clusters = 8#

The number of clusters to form as well as the number of centroids to generate (default:8).

InitMethod init = KMeansPlusPlus#

Method for initialization, defaults to k-means++:

  • InitMethod::KMeansPlusPlus (k-means++): Use scalable k-means++ algorithm to select the initial cluster centers.

  • InitMethod::Random (random): Choose ‘n_clusters’ observations (rows) at random from the input data for the initial centroids.

  • InitMethod::Array (ndarray): Use ‘centroids’ as initial cluster centers.

int max_iter = 300#

Maximum number of iterations of the k-means algorithm for a single run.

double tol = 1e-4#

Relative tolerance with regards to inertia to declare convergence.

rapids_logger::level_enum verbosity = rapids_logger::level_enum::info#

verbosity level.

raft::random::RngState rng_state = {0}#

Seed to the random number generator.

int n_init = 1#

Number of instance k-means algorithm will be run with different seeds.

double oversampling_factor = 2.0#

Oversampling factor for use in the k-means|| algorithm

int batch_samples = 1 << 15#

batch_samples and batch_centroids are used to tile 1NN computation which is useful to optimize/control the memory footprint Default tile is [batch_samples x n_clusters] i.e. when batch_centroids is 0 then don’t tile the centroids

NB: These parameters are unrelated to streaming_batch_size, which controls how many samples to transfer from host to device per batch when processing out-of-core data.

int batch_centroids = 0#

if 0 then batch_centroids = n_clusters

bool inertia_check = false#

If true, check inertia during iterations for early convergence.

int64_t streaming_batch_size = 0#

Number of samples to process per GPU batch when fitting with host data. When set to 0, defaults to n_samples (process all at once). Only used by the batched (host-data) code path and ignored by device-data overloads. Default: 0 (process all data at once).

struct balanced_params : public cuvs::cluster::kmeans::base_params#
#include <kmeans.hpp>

Simple object to specify hyper-parameters to the balanced k-means algorithm.

The following metrics are currently supported in k-means balanced:

  • CosineExpanded

  • InnerProduct

  • L2Expanded

  • L2SqrtExpanded

Public Members

uint32_t n_iters = 20#

Number of training iterations

K-means#

#include <cuvs/cluster/kmeans.hpp>

namespace cuvs::cluster::kmeans

void fit(
raft::resources const &handle,
const cuvs::cluster::kmeans::params &params,
raft::host_matrix_view<const float, int64_t> X,
std::optional<raft::host_vector_view<const float, int64_t>> sample_weight,
raft::device_matrix_view<float, int64_t> centroids,
raft::host_scalar_view<float> inertia,
raft::host_scalar_view<int64_t> n_iter
)#

Find clusters with k-means algorithm using batched processing of host data.

TODO: Evaluate replacing the extent type with int64_t. Reference issue: https://github.com/rapidsai/cuvs/issues/1961

This overload supports out-of-core computation where the dataset resides on the host. Data is processed in GPU-sized batches, streaming from host to device. The batch size is controlled by params.streaming_batch_size.

  #include <raft/core/resources.hpp>
  #include <cuvs/cluster/kmeans.hpp>
  using namespace cuvs::cluster;
  ...
  raft::resources handle;
  cuvs::cluster::kmeans::params params;
  params.n_clusters = 100;
  params.streaming_batch_size = 100000;
  float inertia;
  int64_t n_iter;

  // Data on host
  std::vector<float> h_X(n_samples * n_features);
  auto X = raft::make_host_matrix_view<const float, int64_t>(h_X.data(), n_samples, n_features);

  // Centroids on device
  auto centroids = raft::make_device_matrix<float, int64_t>(handle, params.n_clusters,
n_features);

  kmeans::fit(handle,
              params,
              X,
              std::nullopt,
              centroids.view(),
              raft::make_host_scalar_view(&inertia),
              raft::make_host_scalar_view(&n_iter));
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model. Batch size is read from params.streaming_batch_size.

  • X[in] Training instances on HOST memory. The data must be in row-major format. [dim = n_samples x n_features]

  • sample_weight[in] Optional weights for each observation in X (on host). [len = n_samples]

  • centroids[inout] [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

  • n_iter[out] Number of iterations run.

void fit(
raft::resources const &handle,
const cuvs::cluster::kmeans::params &params,
raft::host_matrix_view<const double, int64_t> X,
std::optional<raft::host_vector_view<const double, int64_t>> sample_weight,
raft::device_matrix_view<double, int64_t> centroids,
raft::host_scalar_view<double> inertia,
raft::host_scalar_view<int64_t> n_iter
)#

Find clusters with k-means algorithm using batched processing of host data.

void fit(
raft::resources const &handle,
const cuvs::cluster::kmeans::params &params,
raft::device_matrix_view<const float, int> X,
std::optional<raft::device_vector_view<const float, int>> sample_weight,
raft::device_matrix_view<float, int> centroids,
raft::host_scalar_view<float> inertia,
raft::host_scalar_view<int> n_iter
)#

Find clusters with k-means algorithm. Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::params params;
int n_features = 15, inertia, n_iter;
auto centroids = raft::make_device_matrix<float, int>(handle, params.n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            std::nullopt,
            centroids,
            raft::make_scalar_view(&inertia),
            raft::make_scalar_view(&n_iter));
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • sample_weight[in] Optional weights for each observation in X. [len = n_samples]

  • centroids[inout] [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

  • n_iter[out] Number of iterations run.

void fit(
raft::resources const &handle,
const cuvs::cluster::kmeans::params &params,
raft::device_matrix_view<const float, int64_t> X,
std::optional<raft::device_vector_view<const float, int64_t>> sample_weight,
raft::device_matrix_view<float, int64_t> centroids,
raft::host_scalar_view<float> inertia,
raft::host_scalar_view<int64_t> n_iter
)#

Find clusters with k-means algorithm. Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.

  #include <raft/core/resources.hpp>
  #include <cuvs/cluster/kmeans.hpp>
  using namespace  cuvs::cluster;
  ...
  raft::resources handle;
  cuvs::cluster::kmeans::params params;
  int64_t n_features = 15, inertia, n_iter;
  auto centroids = raft::make_device_matrix<float, int64_t>(handle, params.n_clusters,
n_features);

  kmeans::fit(handle,
              params,
              X,
              std::nullopt,
              centroids,
              raft::make_scalar_view(&inertia),
              raft::make_scalar_view(&n_iter));
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • sample_weight[in] Optional weights for each observation in X. [len = n_samples]

  • centroids[inout] [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

  • n_iter[out] Number of iterations run.

void fit(
raft::resources const &handle,
const cuvs::cluster::kmeans::params &params,
raft::device_matrix_view<const double, int> X,
std::optional<raft::device_vector_view<const double, int>> sample_weight,
raft::device_matrix_view<double, int> centroids,
raft::host_scalar_view<double> inertia,
raft::host_scalar_view<int> n_iter
)#

Find clusters with k-means algorithm. Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::params params;
int n_features = 15, inertia, n_iter;
auto centroids = raft::make_device_matrix<double, int>(handle, params.n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            std::nullopt,
            centroids,
            raft::make_scalar_view(&inertia),
            raft::make_scalar_view(&n_iter));
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • sample_weight[in] Optional weights for each observation in X. [len = n_samples]

  • centroids[inout] [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

  • n_iter[out] Number of iterations run.

void fit(
raft::resources const &handle,
const cuvs::cluster::kmeans::params &params,
raft::device_matrix_view<const double, int64_t> X,
std::optional<raft::device_vector_view<const double, int64_t>> sample_weight,
raft::device_matrix_view<double, int64_t> centroids,
raft::host_scalar_view<double> inertia,
raft::host_scalar_view<int64_t> n_iter
)#

Find clusters with k-means algorithm. Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.

  #include <raft/core/resources.hpp>
  #include <cuvs/cluster/kmeans.hpp>
  using namespace  cuvs::cluster;
  ...
  raft::resources handle;
  cuvs::cluster::kmeans::params params;
  int64_t n_features = 15, inertia, n_iter;
  auto centroids = raft::make_device_matrix<double, int64_t>(handle, params.n_clusters,
n_features);

  kmeans::fit(handle,
              params,
              X,
              std::nullopt,
              centroids,
              raft::make_scalar_view(&inertia),
              raft::make_scalar_view(&n_iter));
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • sample_weight[in] Optional weights for each observation in X. [len = n_samples]

  • centroids[inout] [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

  • n_iter[out] Number of iterations run.

void fit(
raft::resources const &handle,
const cuvs::cluster::kmeans::params &params,
raft::device_matrix_view<const int8_t, int> X,
std::optional<raft::device_vector_view<const int8_t, int>> sample_weight,
raft::device_matrix_view<int8_t, int> centroids,
raft::host_scalar_view<int8_t> inertia,
raft::host_scalar_view<int> n_iter
)#

Find clusters with k-means algorithm. Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::params params;
int n_features = 15, inertia, n_iter;
auto centroids = raft::make_device_matrix<float, int>(handle, params.n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            std::nullopt,
            centroids,
            raft::make_scalar_view(&inertia),
            raft::make_scalar_view(&n_iter));
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • sample_weight[in] Optional weights for each observation in X. [len = n_samples]

  • centroids[inout] [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

  • n_iter[out] Number of iterations run.

void fit(
const raft::resources &handle,
cuvs::cluster::kmeans::balanced_params const &params,
raft::device_matrix_view<const float, int64_t> X,
raft::device_matrix_view<float, int64_t> centroids,
std::optional<raft::host_scalar_view<float>> inertia = std::nullopt
)#

Find balanced clusters with k-means algorithm.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::balanced_params params;
int64_t n_features = 15;
int64_t n_clusters = 8;
auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            centroids);
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • centroids[out] [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

void fit(
const raft::resources &handle,
cuvs::cluster::kmeans::balanced_params const &params,
raft::device_matrix_view<const int8_t, int64_t> X,
raft::device_matrix_view<float, int64_t> centroids,
std::optional<raft::host_scalar_view<float>> inertia = std::nullopt
)#

Find balanced clusters with k-means algorithm.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::balanced_params params;
int64_t n_features = 15, n_clusters = 8;
auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            centroids);
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • centroids[inout] [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

void fit(
const raft::resources &handle,
cuvs::cluster::kmeans::balanced_params const &params,
raft::device_matrix_view<const half, int64_t> X,
raft::device_matrix_view<float, int64_t> centroids,
std::optional<raft::host_scalar_view<float>> inertia = std::nullopt
)#

Find balanced clusters with k-means algorithm.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::balanced_params params;
int64_t n_features = 15, n_clusters = 8;
auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            centroids);
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • centroids[inout] [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

void fit(
const raft::resources &handle,
cuvs::cluster::kmeans::balanced_params const &params,
raft::device_matrix_view<const uint8_t, int64_t> X,
raft::device_matrix_view<float, int64_t> centroids,
std::optional<raft::host_scalar_view<float>> inertia = std::nullopt
)#

Find balanced clusters with k-means algorithm.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::balanced_params params;
int64_t n_features = 15, n_clusters = 8;
auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            centroids);
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • centroids[inout] [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

void predict(
raft::resources const &handle,
const kmeans::params &params,
raft::device_matrix_view<const float, int> X,
std::optional<raft::device_vector_view<const float, int>> sample_weight,
raft::device_matrix_view<const float, int> centroids,
raft::device_vector_view<int, int> labels,
bool normalize_weight,
raft::host_scalar_view<float> inertia
)#

Predict the closest cluster each sample in X belongs to.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::params params;
int n_features = 15, inertia, n_iter;
auto centroids = raft::make_device_matrix<float, int>(handle, params.n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            std::nullopt,
            centroids.view(),
            raft::make_scalar_view(&inertia),
            raft::make_scalar_view(&n_iter));
...
auto labels = raft::make_device_vector<int, int>(handle, X.extent(0));

kmeans::predict(handle,
                params,
                X,
                std::nullopt,
                centroids.view(),
                false,
                labels.view(),
                raft::make_scalar_view(&inertia));
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] New data to predict. [dim = n_samples x n_features]

  • sample_weight[in] Optional weights for each observation in X. [len = n_samples]

  • centroids[in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]

  • normalize_weight[in] True if the weights should be normalized

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

void predict(
raft::resources const &handle,
const kmeans::params &params,
raft::device_matrix_view<const float, int64_t> X,
std::optional<raft::device_vector_view<const float, int64_t>> sample_weight,
raft::device_matrix_view<const float, int64_t> centroids,
raft::device_vector_view<int64_t, int64_t> labels,
bool normalize_weight,
raft::host_scalar_view<float> inertia
)#

Predict the closest cluster each sample in X belongs to.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::params params;
int n_features = 15, inertia, n_iter;
auto centroids = raft::make_device_matrix<float, int>(handle, params.n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            std::nullopt,
            centroids.view(),
            raft::make_scalar_view(&inertia),
            raft::make_scalar_view(&n_iter));
...
auto labels = raft::make_device_vector<int64_t, int>(handle, X.extent(0));

kmeans::predict(handle,
                params,
                X,
                std::nullopt,
                centroids.view(),
                false,
                labels.view(),
                raft::make_scalar_view(&inertia));
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] New data to predict. [dim = n_samples x n_features]

  • sample_weight[in] Optional weights for each observation in X. [len = n_samples]

  • centroids[in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]

  • normalize_weight[in] True if the weights should be normalized

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

void predict(
raft::resources const &handle,
const kmeans::params &params,
raft::device_matrix_view<const double, int> X,
std::optional<raft::device_vector_view<const double, int>> sample_weight,
raft::device_matrix_view<const double, int> centroids,
raft::device_vector_view<int, int> labels,
bool normalize_weight,
raft::host_scalar_view<double> inertia
)#

Predict the closest cluster each sample in X belongs to.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::params params;
int n_features = 15, inertia, n_iter;
auto centroids = raft::make_device_matrix<double, int>(handle, params.n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            std::nullopt,
            centroids.view(),
            raft::make_scalar_view(&inertia),
            raft::make_scalar_view(&n_iter));
...
auto labels = raft::make_device_vector<int, int>(handle, X.extent(0));

kmeans::predict(handle,
                params,
                X,
                std::nullopt,
                centroids.view(),
                false,
                labels.view(),
                raft::make_scalar_view(&inertia));
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] New data to predict. [dim = n_samples x n_features]

  • sample_weight[in] Optional weights for each observation in X. [len = n_samples]

  • centroids[in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]

  • normalize_weight[in] True if the weights should be normalized

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

void predict(
raft::resources const &handle,
const kmeans::params &params,
raft::device_matrix_view<const double, int64_t> X,
std::optional<raft::device_vector_view<const double, int64_t>> sample_weight,
raft::device_matrix_view<const double, int64_t> centroids,
raft::device_vector_view<int64_t, int64_t> labels,
bool normalize_weight,
raft::host_scalar_view<double> inertia
)#

Predict the closest cluster each sample in X belongs to.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::params params;
int n_features = 15, inertia, n_iter;
auto centroids = raft::make_device_matrix<double, int>(handle, params.n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            std::nullopt,
            centroids.view(),
            raft::make_scalar_view(&inertia),
            raft::make_scalar_view(&n_iter));
...
auto labels = raft::make_device_vector<int64_t, int>(handle, X.extent(0));

kmeans::predict(handle,
                params,
                X,
                std::nullopt,
                centroids.view(),
                false,
                labels.view(),
                raft::make_scalar_view(&inertia));
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] New data to predict. [dim = n_samples x n_features]

  • sample_weight[in] Optional weights for each observation in X. [len = n_samples]

  • centroids[in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]

  • normalize_weight[in] True if the weights should be normalized

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

void predict(
const raft::resources &handle,
cuvs::cluster::kmeans::balanced_params const &params,
raft::device_matrix_view<const int8_t, int64_t> X,
raft::device_matrix_view<const float, int64_t> centroids,
raft::device_vector_view<uint32_t, int64_t> labels
)#

Predict the closest cluster each sample in X belongs to.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::balanced_params params;
int64_t n_features = 15, n_clusters = 8;
auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            centroids.view());
...
auto labels = raft::make_device_vector<uint32_t, int64_t>(handle, X.extent(0));

kmeans::predict(handle,
                params,
                X,
                centroids.view(),
                labels.view());
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] New data to predict. [dim = n_samples x n_features]

  • centroids[in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

void predict(
const raft::resources &handle,
cuvs::cluster::kmeans::balanced_params const &params,
raft::device_matrix_view<const int8_t, int64_t> X,
raft::device_matrix_view<const float, int64_t> centroids,
raft::device_vector_view<int, int64_t> labels
)#

Predict the closest cluster each sample in X belongs to.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::balanced_params params;
int64_t n_features = 15, n_clusters = 8;
auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            centroids.view());
...
auto labels = raft::make_device_vector<int, int64_t>(handle, X.extent(0));

kmeans::predict(handle,
                params,
                X,
                centroids.view(),
                labels.view());
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] New data to predict. [dim = n_samples x n_features]

  • centroids[in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

void predict(
const raft::resources &handle,
cuvs::cluster::kmeans::balanced_params const &params,
raft::device_matrix_view<const float, int64_t> X,
raft::device_matrix_view<const float, int64_t> centroids,
raft::device_vector_view<int, int64_t> labels
)#

Predict the closest cluster each sample in X belongs to.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::balanced_params params;
int64_t n_features = 15, n_clusters = 8;
auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            centroids.view());
...
auto labels = raft::make_device_vector<int, int64_t>(handle, X.extent(0));

kmeans::predict(handle,
                params,
                X,
                centroids.view(),
                labels.view());
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] New data to predict. [dim = n_samples x n_features]

  • centroids[in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

void predict(
const raft::resources &handle,
cuvs::cluster::kmeans::balanced_params const &params,
raft::device_matrix_view<const float, int64_t> X,
raft::device_matrix_view<const float, int64_t> centroids,
raft::device_vector_view<uint32_t, int64_t> labels
)#

Predict the closest cluster each sample in X belongs to.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::balanced_params params;
int64_t n_features = 15, n_clusters = 8;
auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            centroids.view());
...
auto labels = raft::make_device_vector<uint32_t, int64_t>(handle, X.extent(0));

kmeans::predict(handle,
                params,
                X,
                centroids.view(),
                labels.view());
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] New data to predict. [dim = n_samples x n_features]

  • centroids[in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

void predict(
const raft::resources &handle,
cuvs::cluster::kmeans::balanced_params const &params,
raft::device_matrix_view<const half, int64_t> X,
raft::device_matrix_view<const float, int64_t> centroids,
raft::device_vector_view<uint32_t, int64_t> labels
)#

Predict the closest cluster each sample in X belongs to.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::balanced_params params;
int64_t n_features = 15, n_clusters = 8;
auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            centroids.view());
...
auto labels = raft::make_device_vector<uint32_t, int64_t>(handle, X.extent(0));

kmeans::predict(handle,
                params,
                X,
                centroids.view(),
                labels.view());
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] New data to predict. [dim = n_samples x n_features]

  • centroids[in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

void predict(
const raft::resources &handle,
cuvs::cluster::kmeans::balanced_params const &params,
raft::device_matrix_view<const uint8_t, int64_t> X,
raft::device_matrix_view<const float, int64_t> centroids,
raft::device_vector_view<uint32_t, int64_t> labels
)#

Predict the closest cluster each sample in X belongs to.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::balanced_params params;
int64_t n_features = 15, n_clusters = 8;
auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            centroids.view());
...
auto labels = raft::make_device_vector<uint32_t, int64_t>(handle, X.extent(0));

kmeans::predict(handle,
                params,
                X,
                centroids.view(),
                labels.view());
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] New data to predict. [dim = n_samples x n_features]

  • centroids[in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

void fit_predict(
raft::resources const &handle,
const kmeans::params &params,
raft::device_matrix_view<const float, int> X,
std::optional<raft::device_vector_view<const float, int>> sample_weight,
std::optional<raft::device_matrix_view<float, int>> centroids,
raft::device_vector_view<int, int> labels,
raft::host_scalar_view<float> inertia,
raft::host_scalar_view<int> n_iter
)#

Compute k-means clustering and predicts cluster index for each sample in the input.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::params params;
int n_features = 15, inertia, n_iter;
auto centroids = raft::make_device_matrix<float, int>(handle, params.n_clusters, n_features);
auto labels = raft::make_device_vector<int, int>(handle, X.extent(0));

kmeans::fit_predict(handle,
                    params,
                    X,
                    std::nullopt,
                    centroids.view(),
                    labels.view(),
                    raft::make_scalar_view(&inertia),
                    raft::make_scalar_view(&n_iter));
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • sample_weight[in] Optional weights for each observation in X. [len = n_samples]

  • centroids[inout] Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

  • n_iter[out] Number of iterations run.

void fit_predict(
raft::resources const &handle,
const kmeans::params &params,
raft::device_matrix_view<const float, int64_t> X,
std::optional<raft::device_vector_view<const float, int64_t>> sample_weight,
std::optional<raft::device_matrix_view<float, int64_t>> centroids,
raft::device_vector_view<int64_t, int64_t> labels,
raft::host_scalar_view<float> inertia,
raft::host_scalar_view<int64_t> n_iter
)#

Compute k-means clustering and predicts cluster index for each sample in the input.

  #include <raft/core/resources.hpp>
  #include <cuvs/cluster/kmeans.hpp>
  using namespace  cuvs::cluster;
  ...
  raft::resources handle;
  cuvs::cluster::kmeans::params params;
  int64_t n_features = 15, inertia, n_iter;
  auto centroids = raft::make_device_matrix<float, int64_t>(handle, params.n_clusters,
n_features); auto labels = raft::make_device_vector<int64_t, int64_t>(handle, X.extent(0));

  kmeans::fit_predict(handle,
                      params,
                      X,
                      std::nullopt,
                      centroids.view(),
                      labels.view(),
                      raft::make_scalar_view(&inertia),
                      raft::make_scalar_view(&n_iter));
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • sample_weight[in] Optional weights for each observation in X. [len = n_samples]

  • centroids[inout] Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

  • n_iter[out] Number of iterations run.

void fit_predict(
raft::resources const &handle,
const kmeans::params &params,
raft::device_matrix_view<const double, int> X,
std::optional<raft::device_vector_view<const double, int>> sample_weight,
std::optional<raft::device_matrix_view<double, int>> centroids,
raft::device_vector_view<int, int> labels,
raft::host_scalar_view<double> inertia,
raft::host_scalar_view<int> n_iter
)#

Compute k-means clustering and predicts cluster index for each sample in the input.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::params params;
int n_features = 15, inertia, n_iter;
auto centroids = raft::make_device_matrix<double, int>(handle, params.n_clusters, n_features);
auto labels = raft::make_device_vector<int, int>(handle, X.extent(0));

kmeans::fit_predict(handle,
                    params,
                    X,
                    std::nullopt,
                    centroids.view(),
                    labels.view(),
                    raft::make_scalar_view(&inertia),
                    raft::make_scalar_view(&n_iter));
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • sample_weight[in] Optional weights for each observation in X. [len = n_samples]

  • centroids[inout] Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

  • n_iter[out] Number of iterations run.

void fit_predict(
raft::resources const &handle,
const kmeans::params &params,
raft::device_matrix_view<const double, int64_t> X,
std::optional<raft::device_vector_view<const double, int64_t>> sample_weight,
std::optional<raft::device_matrix_view<double, int64_t>> centroids,
raft::device_vector_view<int64_t, int64_t> labels,
raft::host_scalar_view<double> inertia,
raft::host_scalar_view<int64_t> n_iter
)#

Compute k-means clustering and predicts cluster index for each sample in the input.

  #include <raft/core/resources.hpp>
  #include <cuvs/cluster/kmeans.hpp>
  using namespace  cuvs::cluster;
  ...
  raft::resources handle;
  cuvs::cluster::kmeans::params params;
  int64_t n_features = 15, inertia, n_iter;
  auto centroids = raft::make_device_matrix<double, int64_t>(handle, params.n_clusters,
n_features); auto labels = raft::make_device_vector<int64_t, int64_t>(handle, X.extent(0));

  kmeans::fit_predict(handle,
                      params,
                      X,
                      std::nullopt,
                      centroids.view(),
                      labels.view(),
                      raft::make_scalar_view(&inertia),
                      raft::make_scalar_view(&n_iter));
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • sample_weight[in] Optional weights for each observation in X. [len = n_samples]

  • centroids[inout] Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

  • n_iter[out] Number of iterations run.

void fit_predict(
const raft::resources &handle,
cuvs::cluster::kmeans::balanced_params const &params,
raft::device_matrix_view<const float, int64_t> X,
raft::device_matrix_view<float, int64_t> centroids,
raft::device_vector_view<uint32_t, int64_t> labels
)#

Compute balanced k-means clustering and predicts cluster index for each sample in the input.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::balanced_params params;
int64_t n_features = 15, n_clusters = 8;
auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features);
auto labels = raft::make_device_vector<int, int64_t>(handle, X.extent(0));

kmeans::fit_predict(handle,
                    params,
                    X,
                    centroids.view(),
                    labels.view());
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • centroids[inout] Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

void fit_predict(
const raft::resources &handle,
cuvs::cluster::kmeans::balanced_params const &params,
raft::device_matrix_view<const int8_t, int64_t> X,
raft::device_matrix_view<float, int64_t> centroids,
raft::device_vector_view<uint32_t, int64_t> labels
)#

Compute balanced k-means clustering and predicts cluster index for each sample in the input.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
cuvs::cluster::kmeans::balanced_params params;
int64_t n_features = 15, n_clusters = 8;
auto centroids = raft::make_device_matrix<float, int64_t>(handle, n_clusters, n_features);
auto labels = raft::make_device_vector<int, int64_t>(handle, X.extent(0));

kmeans::fit_predict(handle,
                    params,
                    X,
                    centroids.view(),
                    labels.view());
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • centroids[inout] Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

void transform(
raft::resources const &handle,
const kmeans::params &params,
raft::device_matrix_view<const float, int> X,
raft::device_matrix_view<const float, int> centroids,
raft::device_matrix_view<float, int> X_new
)#

Transform X to a cluster-distance space.

Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format [dim = n_samples x n_features]

  • centroids[in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]

  • X_new[out] X transformed in the new space. [dim = n_samples x n_features]

void transform(
raft::resources const &handle,
const kmeans::params &params,
raft::device_matrix_view<const double, int> X,
raft::device_matrix_view<const double, int> centroids,
raft::device_matrix_view<double, int> X_new
)#

Transform X to a cluster-distance space.

Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format [dim = n_samples x n_features]

  • centroids[in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]

  • X_new[out] X transformed in the new space. [dim = n_samples x n_features]

void cluster_cost(
const raft::resources &handle,
raft::device_matrix_view<const float, int> X,
raft::device_matrix_view<const float, int> centroids,
raft::host_scalar_view<float> cost,
std::optional<raft::device_vector_view<const float, int>> sample_weight = std::nullopt
)#

Compute (optionally weighted) cluster cost.

Parameters:
  • handle[in] The raft handle

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • centroids[in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]

  • cost[out] Resulting cluster cost

  • sample_weight[in] Optional per-sample weights. [len = n_samples]

void cluster_cost(
const raft::resources &handle,
raft::device_matrix_view<const double, int> X,
raft::device_matrix_view<const double, int> centroids,
raft::host_scalar_view<double> cost,
std::optional<raft::device_vector_view<const double, int>> sample_weight = std::nullopt
)#

Compute cluster cost.

Parameters:
  • handle[in] The raft handle

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • centroids[in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]

  • cost[out] Resulting cluster cost

  • sample_weight[in] Optional per-sample weights. [len = n_samples]

void cluster_cost(
const raft::resources &handle,
raft::device_matrix_view<const float, int64_t> X,
raft::device_matrix_view<const float, int64_t> centroids,
raft::host_scalar_view<float> cost,
std::optional<raft::device_vector_view<const float, int64_t>> sample_weight = std::nullopt
)#

Compute (optionally weighted) cluster cost.

Parameters:
  • handle[in] The raft handle

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • centroids[in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]

  • cost[out] Resulting cluster cost

  • sample_weight[in] Optional per-sample weights. [len = n_samples]

void cluster_cost(
const raft::resources &handle,
raft::device_matrix_view<const double, int64_t> X,
raft::device_matrix_view<const double, int64_t> centroids,
raft::host_scalar_view<double> cost,
std::optional<raft::device_vector_view<const double, int64_t>> sample_weight = std::nullopt
)#

Compute (optionally weighted) cluster cost.

Parameters:
  • handle[in] The raft handle

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • centroids[in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]

  • cost[out] Resulting cluster cost

  • sample_weight[in] Optional per-sample weights. [len = n_samples]

K-means Helpers#

#include <cuvs/cluster/kmeans.hpp>

namespace cuvs::cluster::kmeans::helpers

void find_k(
raft::resources const &handle,
raft::device_matrix_view<const float, int> X,
raft::host_scalar_view<int> best_k,
raft::host_scalar_view<float> inertia,
raft::host_scalar_view<int> n_iter,
int kmax,
int kmin = 1,
int maxiter = 100,
float tol = 1e-3
)#

Automatically find the optimal value of k using a binary search. This method maximizes the Calinski-Harabasz Index while minimizing the per-cluster inertia.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>

#include <raft/random/make_blobs.cuh>

using namespace  cuvs::cluster;

raft::handle_t handle;
int n_samples = 100, n_features = 15, n_clusters = 10;
auto X = raft::make_device_matrix<float, int>(handle, n_samples, n_features);
auto labels = raft::make_device_vector<float, int>(handle, n_samples);

raft::random::make_blobs(handle, X, labels, n_clusters);

auto best_k = raft::make_host_scalar<int>(0);
auto n_iter = raft::make_host_scalar<int>(0);
auto inertia = raft::make_host_scalar<int>(0);

kmeans::find_k(handle, X, best_k.view(), inertia.view(), n_iter.view(), n_clusters+1);
Parameters:
  • handle – raft handle

  • X – input observations (shape n_samples, n_dims)

  • best_k – best k found from binary search

  • inertia – inertia of best k found

  • n_iter – number of iterations used to find best k

  • kmax – maximum k to try in search

  • kmin – minimum k to try in search (should be >= 1)

  • maxiter – maximum number of iterations to run

  • tol – tolerance for early stopping convergence