silhouette_score#

cuml.metrics.cluster.silhouette_score(X, labels, metric='euclidean', chunksize=None, convert_dtype=True)[source]#

Calculate the mean silhouette coefficient for the provided data.

Given a set of cluster labels for every sample in the provided data, compute the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The silhouette coefficient for a sample is then (b - a) / max(a, b).

Parameters:
Xarray-like, shape = (n_samples, n_features)

The feature vectors for all samples.

labelsarray-like, shape = (n_samples,)

The assigned cluster labels for each sample.

metricstring

A string representation of the distance metric to use for evaluating the silhouette score. Available options are “cityblock”, “cosine”, “euclidean”, “l1”, “l2”, “manhattan”, and “sqeuclidean”.

chunksizeinteger (default = None)

An integer, 1 <= chunksize <= n_samples to tile the pairwise distance matrix computations, so as to reduce the quadratic memory usage of having the entire pairwise distance matrix in GPU memory. If None, chunksize will automatically be set to 40000, which through experiments has proved to be a safe number for the computation to run on a GPU with 16 GB VRAM.