Calculates Batched "Silhouette Score" by tiling the pairwise distance matrix to remove use of quadratic memory
The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is (b - a) / max(a, b). To clarify, b is the distance between a sample and the nearest cluster that the sample is not a part of. Note that Silhouette Coefficient is only defined if number of labels is 2 <= n_labels <= n_samples - 1.
- Parameters
-
[in] | handle | raft::handle_t |
[in] | X | Array of data samples with dimensions (n_rows x n_cols) |
[in] | n_rows | number of data samples |
[in] | n_cols | number of features |
[in] | y | Array containing labels for every data sample (1 x n_rows) |
[in] | n_labels | number of Labels |
[in] | metric | the numerical value that maps to the type of distance metric to be used in the calculations |
[in] | chunk | the row-wise chunk size on which the pairwise distance matrix is tiled |
[out] | scores | Array that is optionally taken in as input if required to be populated with the silhouette score for every sample (1 x nRows), else nullptr is passed |