SpectralClustering#

class cuml.cluster.SpectralClustering(n_clusters=8, *, n_components=None, random_state=None, n_neighbors=10, n_init=10, eigen_tol='auto', affinity='nearest_neighbors', verbose=False, output_type=None)#

Apply spectral clustering from the normalized Laplacian.

In practice spectral clustering is very useful when the structure of the individual clusters is highly non-convex, or when a measure of the center and spread of the cluster is not a suitable description of the complete cluster, such as when clusters are nested circles on the 2D plane.

If the affinity matrix is the adjacency matrix of a graph, this method can be used to find normalized graph cuts.

When calling fit, an affinity matrix is constructed using a k-nearest neighbors connectivity matrix.

Alternatively, a user-provided affinity matrix can be specified by setting affinity='precomputed'.

Parameters:
n_clustersint, default=8

The number of clusters to form.

n_componentsint or None, default=None

Number of eigenvectors to use for the spectral embedding. If None, defaults to n_clusters.

random_stateint, RandomState instance or None, default=None

A pseudo random number generator used for the initialization of the k-means clustering and the eigendecomposition. Use an int to make the results deterministic across calls.

n_neighborsint, default=10

Number of neighbors to use when constructing the affinity matrix using the nearest neighbors method. Ignored for affinity='precomputed'.

n_initint, default=10

Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia. Only used for the k-means step.

eigen_tolfloat or ‘auto’, default=’auto’

Tolerance for the eigensolver. If ‘auto’, a tolerance values of 0.0 is used.

affinity{‘nearest_neighbors’, ‘precomputed’}, default=’nearest_neighbors’
How to construct the affinity matrix.
  • ‘nearest_neighbors’ : construct the affinity matrix by computing a graph of nearest neighbors from the input data.

  • ‘precomputed’ : interpret X as a precomputed affinity matrix, where larger values indicate greater similarity between instances.

verboseint or boolean, default=False

Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None

Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.

Attributes:
labels_cupy.ndarray or np.ndarray of shape (n_samples,)

Cluster labels for each sample.

Methods

fit(self, X[, y])

Perform spectral clustering on X.

fit_predict(self, X[, y])

Perform spectral clustering on X and return cluster labels.

Notes

The eigensolver uses the Lanczos approach from the raft implementation https://docs.rapids.ai/api/raft/stable/pylibraft_api/sparse/#pylibraft.sparse.linalg.eigsh.

Kmeans is used for assigning labels.

References

Examples

>>> import cupy as cp
>>> from sklearn.datasets import make_blobs
>>> from cuml.cluster import SpectralClustering
>>> X, y = make_blobs(n_samples=100, centers=3, n_features=10,
...                   cluster_std=0.5, random_state=42)
>>> X = cp.asarray(X, dtype=cp.float32)
>>> sc = SpectralClustering(n_clusters=3, affinity='nearest_neighbors',
...                         n_neighbors=10, random_state=42)
>>> sc.fit(X)
SpectralClustering()
>>> sc.labels_[:10]
array([2, 0, 1, 1, 2, 2, 1, 0, 2, 0])
fit(self, X, y=None) 'SpectralClustering'[source]#

Perform spectral clustering on X.

Parameters:
Xarray-like or sparse matrix of shape (n_samples, n_features) or (n_samples, n_samples)

Training vector, where n_samples is the number of samples and n_features is the number of features. If affinity is ‘precomputed’, X is the affinity matrix. Supported formats for precomputed affinity: scipy sparse (CSR, CSC, COO), cupy sparse (CSR, CSC, COO), dense numpy arrays, or dense cupy arrays.

yIgnored

Not used, present here for API consistency by convention.

Returns:
selfobject

Returns the instance itself.

fit_predict(self, X, y=None) CumlArray[source]#

Perform spectral clustering on X and return cluster labels.

Parameters:
Xarray-like or sparse matrix of shape (n_samples, n_features) or (n_samples, n_samples)

Training vector, where n_samples is the number of samples and n_features is the number of features. If affinity is ‘precomputed’, X is the affinity matrix. Supported formats for precomputed affinity: scipy sparse (CSR, CSC, COO), cupy sparse (CSR, CSC, COO), dense numpy arrays, or dense cupy arrays.

yIgnored

Not used, present here for API consistency by convention.

Returns:
labelscupy.ndarray of shape (n_samples,)

Cluster labels.