Stats#

This page provides C++ class references for the publicly-exposed elements of the cuvs/stats package.

Silhouette Score#

#include <cuvs/stats/silhouette_score.hpp>

namespace cuvs::stats

float silhouette_score(
raft::resources const &handle,
raft::device_matrix_view<const float, int64_t, raft::row_major> X_in,
raft::device_vector_view<const int, int64_t> labels,
std::optional<raft::device_vector_view<float, int64_t>> silhouette_score_per_sample,
int64_t n_unique_labels,
cuvs::distance::DistanceType metric = cuvs::distance::DistanceType::L2Unexpanded,
)#

main function that returns the average silhouette score for a given set of data and its clusterings

Parameters:
  • handle[in] raft handle for managing expensive resources

  • X_in[in] input matrix Data in row-major format (nRows x nCols)

  • labels[in] the pointer to the array containing labels for every data sample (length: nRows)

  • silhouette_score_per_sample[out] optional array populated with the silhouette score for every sample (length: nRows)

  • n_unique_labels[in] number of unique labels in the labels array

  • metric[in] Distance metric to use. Euclidean (L2) is used by default

Returns:

: The silhouette score.

float silhouette_score_batched(
raft::resources const &handle,
raft::device_matrix_view<const float, int64_t, raft::row_major> X,
raft::device_vector_view<const int, int64_t> labels,
std::optional<raft::device_vector_view<float, int64_t>> silhouette_score_per_sample,
int64_t n_unique_labels,
int64_t batch_size,
cuvs::distance::DistanceType metric = cuvs::distance::DistanceType::L2Unexpanded,
)#

function that returns the average silhouette score for a given set of data and its clusterings

Parameters:
  • handle[in] raft handle for managing expensive resources

  • X[in] input matrix Data in row-major format (nRows x nCols)

  • labels[in] the pointer to the array containing labels for every data sample (length: nRows)

  • silhouette_score_per_sample[out] optional array populated with the silhouette score for every sample (length: nRows)

  • n_unique_labels[in] number of unique labels in the labels array

  • batch_size[in] number of samples per batch

  • metric[in] the numerical value that maps to the type of distance metric to be used in the calculations

Returns:

: The silhouette score.

double silhouette_score(
raft::resources const &handle,
raft::device_matrix_view<const double, int64_t, raft::row_major> X_in,
raft::device_vector_view<const int, int64_t> labels,
std::optional<raft::device_vector_view<double, int64_t>> silhouette_score_per_sample,
int64_t n_unique_labels,
cuvs::distance::DistanceType metric = cuvs::distance::DistanceType::L2Unexpanded,
)#

main function that returns the average silhouette score for a given set of data and its clusterings

Parameters:
  • handle[in] raft handle for managing expensive resources

  • X_in[in] input matrix Data in row-major format (nRows x nCols)

  • labels[in] the pointer to the array containing labels for every data sample (length: nRows)

  • silhouette_score_per_sample[out] optional array populated with the silhouette score for every sample (length: nRows)

  • n_unique_labels[in] number of unique labels in the labels array

  • metric[in] the numerical value that maps to the type of distance metric to be used in the calculations

Returns:

: The silhouette score.

double silhouette_score_batched(
raft::resources const &handle,
raft::device_matrix_view<const double, int64_t, raft::row_major> X,
raft::device_vector_view<const int, int64_t> labels,
std::optional<raft::device_vector_view<double, int64_t>> silhouette_score_per_sample,
int64_t n_unique_labels,
int64_t batch_size,
cuvs::distance::DistanceType metric = cuvs::distance::DistanceType::L2Unexpanded,
)#

function that returns the average silhouette score for a given set of data and its clusterings

Parameters:
  • handle[in] raft handle for managing expensive resources

  • X[in] input matrix Data in row-major format (nRows x nCols)

  • labels[in] the pointer to the array containing labels for every data sample (length: nRows)

  • silhouette_score_per_sample[out] optional array populated with the silhouette score for every sample (length: nRows)

  • n_unique_labels[in] number of unique labels in the labels array

  • batch_size[in] number of samples per batch

  • metric[in] the numerical value that maps to the type of distance metric to be used in the calculations

Returns:

: The silhouette score.

Trustworthiness Score#

#include <cuvs/stats/trustworthiness_score.hpp>

namespace cuvs::stats

double trustworthiness_score(
raft::resources const &handle,
raft::device_matrix_view<const float, int64_t, raft::row_major> X,
raft::device_matrix_view<const float, int64_t, raft::row_major> X_embedded,
int n_neighbors,
cuvs::distance::DistanceType metric = cuvs::distance::DistanceType::L2SqrtUnexpanded,
int batch_size = 512,
)#

Compute the trustworthiness score.

Note

The constness of the data in X_embedded is currently casted away and the data is slightly modified.

Parameters:
  • handle[in] the raft handle

  • X[in] Data in original dimension

  • X_embedded[in] Data in target dimension (embedding)

  • n_neighbors[in] Number of neighbors considered by trustworthiness score

  • metric[in] Distance metric to use. Euclidean (L2) is used by default

  • batch_size[in] Batch size

Returns:

Trustworthiness score