TruncatedSVD#

class cuml.dask.decomposition.TruncatedSVD(*, client=None, **kwargs)[source]#
Parameters:
n_componentsint (default = 1)

The number of top K singular vectors / values you want. Must be <= number(columns).

svd_solver‘full’, ‘jacobi’

Only Full algorithm is supported since it’s significantly faster on GPU then the other solvers including randomized SVD.

verboseint or boolean, default=False

Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

Attributes:
components_array

The top K components (VT.T[:,:n_components]) in U, S, VT = svd(X)

explained_variance_array

How much each component explains the variance in the data given by S**2

explained_variance_ratio_array

How much in % the variance is explained given by S**2/sum(S**2)

singular_values_array

The top K singular values. Remember all singular values >= 0

Methods

fit(X[, _transform])

Fit the model with X.

fit_transform(X)

Fit the model with X and apply the dimensionality reduction on X.

inverse_transform(X[, delayed])

Transform data back to its original space.

transform(X[, delayed])

Apply dimensionality reduction to X.

Examples

>>> from dask_cuda import LocalCUDACluster
>>> from dask.distributed import Client, wait
>>> import cupy as cp
>>> from cuml.dask.decomposition import TruncatedSVD
>>> from cuml.dask.datasets import make_blobs

>>> cluster = LocalCUDACluster(threads_per_worker=1)
>>> client = Client(cluster)

>>> nrows = 6
>>> ncols = 3
>>> n_parts = 2

>>> X_cudf, _ = make_blobs(n_samples=nrows, n_features=ncols,
...                        centers=1, n_parts=n_parts,
...                        cluster_std=1.8, random_state=10,
...                        dtype=cp.float32)
>>> in_blobs = X_cudf.compute()
>>> print(in_blobs)
[[ 6.953966    6.2313757   0.84974563]
[10.012338    3.4641726   3.0827546 ]
[ 9.537406    4.0504313   3.2793145 ]
[ 8.32713     2.957846    1.8215517 ]
[ 5.7044296   1.855514    3.7996366 ]
[10.089077    2.1995444   2.2072687 ]]
>>> cumlModel = TruncatedSVD(n_components = 1)
>>> XT = cumlModel.fit_transform(X_cudf)
>>> result = XT.compute()
>>> print(result)
[[ 8.699628   0.         0.       ]
[11.018815   0.         0.       ]
[10.8554535  0.         0.       ]
[ 9.000192   0.         0.       ]
[ 6.7628784  0.         0.       ]
[10.40526    0.         0.       ]]
>>> client.close()
>>> cluster.close()
fit(X, _transform=False)[source]#

Fit the model with X.

Parameters:
Xdask cuDF input
fit_transform(X)[source]#

Fit the model with X and apply the dimensionality reduction on X.

Parameters:
Xdask cuDF
Returns:
X_newdask cuDF
inverse_transform(X, delayed=True)[source]#

Transform data back to its original space.

In other words, return an input X_original whose transform would be X.

Parameters:
Xdask cuDF
Returns:
X_originaldask cuDF
transform(X, delayed=True)[source]#

Apply dimensionality reduction to X.

X is projected on the first principal components previously extracted from a training set.

Parameters:
Xdask cuDF
Returns:
X_newdask cuDF