Functions
ML::Metrics::Batched Namespace Reference

Functions

float silhouette_score (const raft::handle_t &handle, float *X, int n_rows, int n_cols, int *y, int n_labels, float *scores, int chunk, cuvs::distance::DistanceType metric)
 
double silhouette_score (const raft::handle_t &handle, double *X, int n_rows, int n_cols, int *y, int n_labels, double *scores, int chunk, cuvs::distance::DistanceType metric)
 

Function Documentation

◆ silhouette_score() [1/2]

double ML::Metrics::Batched::silhouette_score ( const raft::handle_t &  handle,
double *  X,
int  n_rows,
int  n_cols,
int *  y,
int  n_labels,
double *  scores,
int  chunk,
cuvs::distance::DistanceType  metric 
)

◆ silhouette_score() [2/2]

float ML::Metrics::Batched::silhouette_score ( const raft::handle_t &  handle,
float *  X,
int  n_rows,
int  n_cols,
int *  y,
int  n_labels,
float *  scores,
int  chunk,
cuvs::distance::DistanceType  metric 
)

Calculates Batched "Silhouette Score" by tiling the pairwise distance matrix to remove use of quadratic memory

The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is (b - a) / max(a, b). To clarify, b is the distance between a sample and the nearest cluster that the sample is not a part of. Note that Silhouette Coefficient is only defined if number of labels is 2 <= n_labels <= n_samples - 1.

Parameters
[in]handleraft::handle_t
[in]XArray of data samples with dimensions (n_rows x n_cols)
[in]n_rowsnumber of data samples
[in]n_colsnumber of features
[in]yArray containing labels for every data sample (1 x n_rows)
[in]n_labelsnumber of Labels
[in]metricthe numerical value that maps to the type of distance metric to be used in the calculations
[in]chunkthe row-wise chunk size on which the pairwise distance matrix is tiled
[out]scoresArray that is optionally taken in as input if required to be populated with the silhouette score for every sample (1 x nRows), else nullptr is passed