Cluster#

Params#

#include <cuvs/cluster/kmeans.hpp>

namespace cuvs::cluster::kmeans

struct params : public cuvs::cluster::kmeans::base_params#
#include <kmeans.hpp>

Simple object to specify hyper-parameters to the kmeans algorithm.

Public Members

int n_clusters = 8#

The number of clusters to form as well as the number of centroids to generate (default:8).

InitMethod init = KMeansPlusPlus#

Method for initialization, defaults to k-means++:

  • InitMethod::KMeansPlusPlus (k-means++): Use scalable k-means++ algorithm to select the initial cluster centers.

  • InitMethod::Random (random): Choose ‘n_clusters’ observations (rows) at random from the input data for the initial centroids.

  • InitMethod::Array (ndarray): Use ‘centroids’ as initial cluster centers.

int max_iter = 300#

Maximum number of iterations of the k-means algorithm for a single run.

double tol = 1e-4#

Relative tolerance with regards to inertia to declare convergence.

int verbosity = RAFT_LEVEL_INFO#

verbosity level.

raft::random::RngState rng_state = {0}#

Seed to the random number generator.

int n_init = 1#

Number of instance k-means algorithm will be run with different seeds.

double oversampling_factor = 2.0#

Oversampling factor for use in the k-means|| algorithm

int batch_centroids = 0#

if 0 then batch_centroids = n_clusters

struct balanced_params : public cuvs::cluster::kmeans::base_params#
#include <kmeans.hpp>

Simple object to specify hyper-parameters to the balanced k-means algorithm.

The following metrics are currently supported in k-means balanced:

  • InnerProduct

  • L2Expanded

  • L2SqrtExpanded

Public Members

uint32_t n_iters = 20#

Number of training iterations

K-means#

include <cuvs/cluster/kmeans.hpp>

namespace cuvs::cluster::kmeans

void fit(raft::resources const &handle, const cuvs::cluster::kmeans::params &params, raft::device_matrix_view<const float, int> X, std::optional<raft::device_vector_view<const float, int>> sample_weight, raft::device_matrix_view<float, int> centroids, raft::host_scalar_view<float, int> inertia, raft::host_scalar_view<int, int> n_iter)#

Find clusters with k-means algorithm. Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
 cuvs::cluster::kmeans::params params;
int n_features = 15, inertia, n_iter;
auto centroids = raft::make_device_matrix<float, int>(handle, params.n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            std::nullopt,
            centroids,
            raft::make_scalar_view(&inertia),
            raft::make_scalar_view(&n_iter));
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • sample_weight[in] Optional weights for each observation in X. [len = n_samples]

  • centroids[inout] [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

  • n_iter[out] Number of iterations run.

void fit(raft::resources const &handle, const cuvs::cluster::kmeans::params &params, raft::device_matrix_view<const int8_t, int> X, std::optional<raft::device_vector_view<const int8_t, int>> sample_weight, raft::device_matrix_view<int8_t, int> centroids, raft::host_scalar_view<int8_t, int> inertia, raft::host_scalar_view<int, int> n_iter)#

Find clusters with k-means algorithm. Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
 cuvs::cluster::kmeans::params params;
int n_features = 15, inertia, n_iter;
auto centroids = raft::make_device_matrix<float, int>(handle, params.n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            std::nullopt,
            centroids,
            raft::make_scalar_view(&inertia),
            raft::make_scalar_view(&n_iter));
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • sample_weight[in] Optional weights for each observation in X. [len = n_samples]

  • centroids[inout] [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

  • n_iter[out] Number of iterations run.

void fit(const raft::resources &handle, cuvs::cluster::kmeans::balanced_params const &params, raft::device_matrix_view<const float, int> X, raft::device_matrix_view<float, int> centroids)#

Find balanced clusters with k-means algorithm.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
 cuvs::cluster::kmeans::balanced_params params;
int n_features = 15;
auto centroids = raft::make_device_matrix<float, int>(handle, params.n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            centroids);
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • centroids[out] [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

void fit(const raft::resources &handle, cuvs::cluster::kmeans::balanced_params const &params, raft::device_matrix_view<const int8_t, int> X, raft::device_matrix_view<int8_t, int> centroids)#

Find balanced clusters with k-means algorithm.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
 cuvs::cluster::kmeans::balanced_params params;
int n_features = 15;
auto centroids = raft::make_device_matrix<float, int>(handle, params.n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            centroids);
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • centroids[inout] [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

void predict(raft::resources const &handle, const kmeans::params &params, raft::device_matrix_view<const float, int> X, std::optional<raft::device_vector_view<const float, int>> sample_weight, raft::device_matrix_view<const float, int> centroids, raft::device_vector_view<uint32_t, int> labels, bool normalize_weight, raft::host_scalar_view<float> inertia)#

Predict the closest cluster each sample in X belongs to.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
 cuvs::cluster::kmeans::params params;
int n_features = 15, inertia, n_iter;
auto centroids = raft::make_device_matrix<float, int>(handle, params.n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            std::nullopt,
            centroids.view(),
            raft::make_scalar_view(&inertia),
            raft::make_scalar_view(&n_iter));
...
auto labels = raft::make_device_vector<int, int>(handle, X.extent(0));

kmeans::predict(handle,
                params,
                X,
                std::nullopt,
                centroids.view(),
                false,
                labels.view(),
                raft::make_scalar_view(&ineratia));
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] New data to predict. [dim = n_samples x n_features]

  • sample_weight[in] Optional weights for each observation in X. [len = n_samples]

  • centroids[in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]

  • normalize_weight[in] True if the weights should be normalized

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

void predict(const raft::resources &handle, cuvs::cluster::kmeans::balanced_params const &params, raft::device_matrix_view<const int8_t, int> X, raft::device_matrix_view<const float, int> centroids, raft::device_vector_view<uint32_t, int> labels)#

Predict the closest cluster each sample in X belongs to.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
 cuvs::cluster::kmeans::params params;
int n_features = 15, inertia, n_iter;
auto centroids = raft::make_device_matrix<float, int>(handle, params.n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            std::nullopt,
            centroids.view(),
            raft::make_scalar_view(&inertia),
            raft::make_scalar_view(&n_iter));
...
auto labels = raft::make_device_vector<int, int>(handle, X.extent(0));

kmeans::predict(handle,
                params,
                X,
                std::nullopt,
                centroids.view(),
                false,
                labels.view(),
                raft::make_scalar_view(&ineratia));
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] New data to predict. [dim = n_samples x n_features]

  • centroids[in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

void predict(const raft::resources &handle, cuvs::cluster::kmeans::balanced_params const &params, raft::device_matrix_view<const float, int> X, raft::device_matrix_view<const float, int> centroids, raft::device_vector_view<uint32_t, int> labels)#

Predict the closest cluster each sample in X belongs to.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
 cuvs::cluster::kmeans::balanced_params params;
int n_features = 15;
auto centroids = raft::make_device_matrix<float, int>(handle, params.n_clusters, n_features);

kmeans::fit(handle,
            params,
            X,
            centroids.view());
...
auto labels = raft::make_device_vector<int, int>(handle, X.extent(0));

kmeans::predict(handle,
                params,
                X,
                centroids.view(),
                labels.view());
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] New data to predict. [dim = n_samples x n_features]

  • centroids[in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

void fit_predict(raft::resources const &handle, const kmeans::params &params, raft::device_matrix_view<const float, int> X, std::optional<raft::device_vector_view<const float, int>> sample_weight, std::optional<raft::device_matrix_view<float, int>> centroids, raft::device_vector_view<int, int> labels, raft::host_scalar_view<float> inertia, raft::host_scalar_view<int> n_iter)#

Compute k-means clustering and predicts cluster index for each sample in the input.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
 cuvs::cluster::kmeans::params params;
int n_features = 15, inertia, n_iter;
auto centroids = raft::make_device_matrix<float, int>(handle, params.n_clusters, n_features);
auto labels = raft::make_device_vector<int, int>(handle, X.extent(0));

kmeans::fit_predict(handle,
                    params,
                    X,
                    std::nullopt,
                    centroids.view(),
                    labels.view(),
                    raft::make_scalar_view(&inertia),
                    raft::make_scalar_view(&n_iter));
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • sample_weight[in] Optional weights for each observation in X. [len = n_samples]

  • centroids[inout] Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

  • inertia[out] Sum of squared distances of samples to their closest cluster center.

  • n_iter[out] Number of iterations run.

void fit_predict(const raft::resources &handle, cuvs::cluster::kmeans::balanced_params const &params, raft::device_matrix_view<const float, int> X, raft::device_matrix_view<float, int> centroids, raft::device_vector_view<uint32_t, int> labels)#

Compute balanced k-means clustering and predicts cluster index for each sample in the input.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
 cuvs::cluster::kmeans::balanced_params params;
int n_features = 15;
auto centroids = raft::make_device_matrix<float, int>(handle, params.n_clusters, n_features);
auto labels = raft::make_device_vector<int, int>(handle, X.extent(0));

kmeans::fit_predict(handle,
                    params,
                    X,
                    centroids.view(),
                    labels.view());
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • centroids[inout] Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

void fit_predict(const raft::resources &handle, cuvs::cluster::kmeans::balanced_params const &params, raft::device_matrix_view<const int8_t, int> X, raft::device_matrix_view<float, int> centroids, raft::device_vector_view<uint32_t, int> labels)#

Compute balanced k-means clustering and predicts cluster index for each sample in the input.

#include <raft/core/resources.hpp>
#include <cuvs/cluster/kmeans.hpp>
using namespace  cuvs::cluster;
...
raft::resources handle;
 cuvs::cluster::kmeans::balanced_params params;
int n_features = 15;
auto centroids = raft::make_device_matrix<float, int>(handle, params.n_clusters, n_features);
auto labels = raft::make_device_vector<int, int>(handle, X.extent(0));

kmeans::fit_predict(handle,
                    params,
                    X,
                    centroids.view(),
                    labels.view());
Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]

  • centroids[inout] Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]

  • labels[out] Index of the cluster each sample in X belongs to. [len = n_samples]

void transform(raft::resources const &handle, const kmeans::params &params, raft::device_matrix_view<const float, int> X, raft::device_matrix_view<const float, int> centroids, raft::device_matrix_view<float, int> X_new)#

Transform X to a cluster-distance space.

Parameters:
  • handle[in] The raft handle.

  • params[in] Parameters for KMeans model.

  • X[in] Training instances to cluster. The data must be in row-major format [dim = n_samples x n_features]

  • centroids[in] Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]

  • X_new[out] X transformed in the new space. [dim = n_samples x n_features]

K-means Helpers#

include <cuvs/cluster/kmeans.hpp>

namespace cuvs::cluster::kmeans::helpers