SpectralClustering#
- class cuml.cluster.SpectralClustering(n_clusters=8, *, n_components=None, random_state=None, n_neighbors=10, n_init=10, eigen_tol='auto', affinity='nearest_neighbors', verbose=False, output_type=None)#
Apply spectral clustering from the normalized Laplacian.
In practice spectral clustering is very useful when the structure of the individual clusters is highly non-convex, or when a measure of the center and spread of the cluster is not a suitable description of the complete cluster, such as when clusters are nested circles on the 2D plane.
If the affinity matrix is the adjacency matrix of a graph, this method can be used to find normalized graph cuts.
When calling
fit, an affinity matrix is constructed using a k-nearest neighbors connectivity matrix.Alternatively, a user-provided affinity matrix can be specified by setting
affinity='precomputed'.- Parameters:
- n_clustersint, default=8
The number of clusters to form.
- n_componentsint or None, default=None
Number of eigenvectors to use for the spectral embedding. If None, defaults to n_clusters.
- random_stateint, RandomState instance or None, default=None
A pseudo random number generator used for the initialization of the k-means clustering and the eigendecomposition. Use an int to make the results deterministic across calls.
- n_neighborsint, default=10
Number of neighbors to use when constructing the affinity matrix using the nearest neighbors method. Ignored for
affinity='precomputed'.- n_initint, default=10
Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia. Only used for the k-means step.
- eigen_tolfloat or ‘auto’, default=’auto’
Tolerance for the eigensolver. If ‘auto’, a tolerance values of 0.0 is used.
- affinity{‘nearest_neighbors’, ‘precomputed’}, default=’nearest_neighbors’
- How to construct the affinity matrix.
‘nearest_neighbors’ : construct the affinity matrix by computing a graph of nearest neighbors from the input data.
‘precomputed’ : interpret X as a precomputed affinity matrix, where larger values indicate greater similarity between instances.
- verboseint or boolean, default=False
Sets logging level. It must be one of
cuml.common.logger.level_*. See Verbosity Levels for more info.- output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None
Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (
cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.
- Attributes:
- labels_cupy.ndarray or np.ndarray of shape (n_samples,)
Cluster labels for each sample.
Methods
fit(self, X[, y])Perform spectral clustering on
X.fit_predict(self, X[, y])Perform spectral clustering on
Xand return cluster labels.Notes
The eigensolver uses the Lanczos approach from the raft implementation https://docs.rapids.ai/api/raft/stable/pylibraft_api/sparse/#pylibraft.sparse.linalg.eigsh.
Kmeans is used for assigning labels.
References
Normalized cuts and image segmentation, 2000 Jianbo Shi, Jitendra Malik http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.160.2324
A Tutorial on Spectral Clustering, 2007 Ulrike von Luxburg http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.165.9323
Multiclass spectral clustering, 2003 Stella X. Yu, Jianbo Shi https://www1.icsi.berkeley.edu/~stellayu/publication/doc/2003kwayICCV.pdf
Examples
>>> import cupy as cp >>> from sklearn.datasets import make_blobs >>> from cuml.cluster import SpectralClustering >>> X, y = make_blobs(n_samples=100, centers=3, n_features=10, ... cluster_std=0.5, random_state=42) >>> X = cp.asarray(X, dtype=cp.float32) >>> sc = SpectralClustering(n_clusters=3, affinity='nearest_neighbors', ... n_neighbors=10, random_state=42) >>> sc.fit(X) SpectralClustering() >>> sc.labels_[:10] array([2, 0, 1, 1, 2, 2, 1, 0, 2, 0])
- fit(self, X, y=None) 'SpectralClustering'[source]#
Perform spectral clustering on
X.- Parameters:
- Xarray-like or sparse matrix of shape (n_samples, n_features) or (n_samples, n_samples)
Training vector, where
n_samplesis the number of samples andn_featuresis the number of features. If affinity is ‘precomputed’, X is the affinity matrix. Supported formats for precomputed affinity: scipy sparse (CSR, CSC, COO), cupy sparse (CSR, CSC, COO), dense numpy arrays, or dense cupy arrays.- yIgnored
Not used, present here for API consistency by convention.
- Returns:
- selfobject
Returns the instance itself.
- fit_predict(self, X, y=None) CumlArray[source]#
Perform spectral clustering on
Xand return cluster labels.- Parameters:
- Xarray-like or sparse matrix of shape (n_samples, n_features) or (n_samples, n_samples)
Training vector, where
n_samplesis the number of samples andn_featuresis the number of features. If affinity is ‘precomputed’, X is the affinity matrix. Supported formats for precomputed affinity: scipy sparse (CSR, CSC, COO), cupy sparse (CSR, CSC, COO), dense numpy arrays, or dense cupy arrays.- yIgnored
Not used, present here for API consistency by convention.
- Returns:
- labelscupy.ndarray of shape (n_samples,)
Cluster labels.