Stats#

This page provides C++ class references for the publicly-exposed elements of the cuvs/stats package.

Silhouette Score#

#include <cuvs/stats/silhouette_score.hpp>

namespace cuvs::stats

float silhouette_score(raft::resources const &handle, raft::device_matrix_view<const float, int64_t, raft::row_major> X_in, raft::device_vector_view<const int, int64_t> labels, std::optional<raft::device_vector_view<float, int64_t>> silhouette_score_per_sample, int64_t n_unique_labels, cuvs::distance::DistanceType metric = cuvs::distance::DistanceType::L2Unexpanded)#

main function that returns the average silhouette score for a given set of data and its clusterings

Parameters:
  • handle[in] raft handle for managing expensive resources

  • X_in[in] input matrix Data in row-major format (nRows x nCols)

  • labels[in] the pointer to the array containing labels for every data sample (length: nRows)

  • silhouette_score_per_sample[out] optional array populated with the silhouette score for every sample (length: nRows)

  • n_unique_labels[in] number of unique labels in the labels array

  • metric[in] Distance metric to use. Euclidean (L2) is used by default

Returns:

: The silhouette score.

float silhouette_score_batched(raft::resources const &handle, raft::device_matrix_view<const float, int64_t, raft::row_major> X, raft::device_vector_view<const int, int64_t> labels, std::optional<raft::device_vector_view<float, int64_t>> silhouette_score_per_sample, int64_t n_unique_labels, int64_t batch_size, cuvs::distance::DistanceType metric = cuvs::distance::DistanceType::L2Unexpanded)#

function that returns the average silhouette score for a given set of data and its clusterings

Parameters:
  • handle[in] raft handle for managing expensive resources

  • X[in] input matrix Data in row-major format (nRows x nCols)

  • labels[in] the pointer to the array containing labels for every data sample (length: nRows)

  • silhouette_score_per_sample[out] optional array populated with the silhouette score for every sample (length: nRows)

  • n_unique_labels[in] number of unique labels in the labels array

  • batch_size[in] number of samples per batch

  • metric[in] the numerical value that maps to the type of distance metric to be used in the calculations

Returns:

: The silhouette score.

double silhouette_score(raft::resources const &handle, raft::device_matrix_view<const double, int64_t, raft::row_major> X_in, raft::device_vector_view<const int, int64_t> labels, std::optional<raft::device_vector_view<double, int64_t>> silhouette_score_per_sample, int64_t n_unique_labels, cuvs::distance::DistanceType metric = cuvs::distance::DistanceType::L2Unexpanded)#

main function that returns the average silhouette score for a given set of data and its clusterings

Parameters:
  • handle[in] raft handle for managing expensive resources

  • X_in[in] input matrix Data in row-major format (nRows x nCols)

  • labels[in] the pointer to the array containing labels for every data sample (length: nRows)

  • silhouette_score_per_sample[out] optional array populated with the silhouette score for every sample (length: nRows)

  • n_unique_labels[in] number of unique labels in the labels array

  • metric[in] the numerical value that maps to the type of distance metric to be used in the calculations

Returns:

: The silhouette score.

double silhouette_score_batched(raft::resources const &handle, raft::device_matrix_view<const double, int64_t, raft::row_major> X, raft::device_vector_view<const int, int64_t> labels, std::optional<raft::device_vector_view<double, int64_t>> silhouette_score_per_sample, int64_t n_unique_labels, int64_t batch_size, cuvs::distance::DistanceType metric = cuvs::distance::DistanceType::L2Unexpanded)#

function that returns the average silhouette score for a given set of data and its clusterings

Parameters:
  • handle[in] raft handle for managing expensive resources

  • X[in] input matrix Data in row-major format (nRows x nCols)

  • labels[in] the pointer to the array containing labels for every data sample (length: nRows)

  • silhouette_score_per_sample[out] optional array populated with the silhouette score for every sample (length: nRows)

  • n_unique_labels[in] number of unique labels in the labels array

  • batch_size[in] number of samples per batch

  • metric[in] the numerical value that maps to the type of distance metric to be used in the calculations

Returns:

: The silhouette score.

Trustworthiness Score#

#include <cuvs/stats/trustworthiness_score.hpp>

namespace cuvs::stats

double trustworthiness_score(raft::resources const &handle, raft::device_matrix_view<const float, int64_t, raft::row_major> X, raft::device_matrix_view<const float, int64_t, raft::row_major> X_embedded, int n_neighbors, cuvs::distance::DistanceType metric = cuvs::distance::DistanceType::L2SqrtUnexpanded, int batch_size = 512)#

Compute the trustworthiness score.

Note

The constness of the data in X_embedded is currently casted away and the data is slightly modified.

Parameters:
  • handle[in] the raft handle

  • X[in] Data in original dimension

  • X_embedded[in] Data in target dimension (embedding)

  • n_neighbors[in] Number of neighbors considered by trustworthiness score

  • metric[in] Distance metric to use. Euclidean (L2) is used by default

  • batch_size[in] Batch size

Returns:

Trustworthiness score