TruncatedSVD#
- class cuml.dask.decomposition.TruncatedSVD(*, client=None, **kwargs)[source]#
- Parameters:
- n_componentsint (default = 1)
The number of top K singular vectors / values you want. Must be <= number(columns).
- svd_solver‘full’, ‘jacobi’
Only Full algorithm is supported since it’s significantly faster on GPU then the other solvers including randomized SVD.
- verboseint or boolean, default=False
Sets logging level. It must be one of
cuml.common.logger.level_*. See Verbosity Levels for more info.
- Attributes:
- components_array
The top K components (VT.T[:,:n_components]) in U, S, VT = svd(X)
- explained_variance_array
How much each component explains the variance in the data given by S**2
- explained_variance_ratio_array
How much in % the variance is explained given by S**2/sum(S**2)
- singular_values_array
The top K singular values. Remember all singular values >= 0
Methods
fit(X[, _transform])Fit the model with X.
Fit the model with X and apply the dimensionality reduction on X.
inverse_transform(X[, delayed])Transform data back to its original space.
transform(X[, delayed])Apply dimensionality reduction to
X.Examples
>>> from dask_cuda import LocalCUDACluster >>> from dask.distributed import Client, wait >>> import cupy as cp >>> from cuml.dask.decomposition import TruncatedSVD >>> from cuml.dask.datasets import make_blobs >>> cluster = LocalCUDACluster(threads_per_worker=1) >>> client = Client(cluster) >>> nrows = 6 >>> ncols = 3 >>> n_parts = 2 >>> X_cudf, _ = make_blobs(n_samples=nrows, n_features=ncols, ... centers=1, n_parts=n_parts, ... cluster_std=1.8, random_state=10, ... dtype=cp.float32) >>> in_blobs = X_cudf.compute() >>> print(in_blobs) [[ 6.953966 6.2313757 0.84974563] [10.012338 3.4641726 3.0827546 ] [ 9.537406 4.0504313 3.2793145 ] [ 8.32713 2.957846 1.8215517 ] [ 5.7044296 1.855514 3.7996366 ] [10.089077 2.1995444 2.2072687 ]] >>> cumlModel = TruncatedSVD(n_components = 1) >>> XT = cumlModel.fit_transform(X_cudf) >>> result = XT.compute() >>> print(result) [[ 8.699628 0. 0. ] [11.018815 0. 0. ] [10.8554535 0. 0. ] [ 9.000192 0. 0. ] [ 6.7628784 0. 0. ] [10.40526 0. 0. ]] >>> client.close() >>> cluster.close()
- fit_transform(X)[source]#
Fit the model with X and apply the dimensionality reduction on X.
- Parameters:
- Xdask cuDF
- Returns:
- X_newdask cuDF