TruncatedSVD#

class cuml.dask.decomposition.TruncatedSVD(*, client=None, **kwargs)[source]#

Parameters:

n_componentsint (default = 1): The number of top K singular vectors / values you want. Must be <= number(columns).
svd_solver‘full’, ‘jacobi’: Only Full algorithm is supported since it’s significantly faster on GPU then the other solvers including randomized SVD.
verboseint or boolean, default=False: Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

Attributes:

components_array: The top K components (VT.T[:,:n_components]) in U, S, VT = svd(X)
explained_variance_array: How much each component explains the variance in the data given by S**2
explained_variance_ratio_array: How much in % the variance is explained given by S**2/sum(S**2)
singular_values_array: The top K singular values. Remember all singular values >= 0

Methods

`fit`(X[, _transform])	Fit the model with X.
`fit_transform`(X)	Fit the model with X and apply the dimensionality reduction on X.
`inverse_transform`(X[, delayed])	Transform data back to its original space.
`transform`(X[, delayed])	Apply dimensionality reduction to `X`.

Examples

>>> from dask_cuda import LocalCUDACluster
>>> from dask.distributed import Client, wait
>>> import cupy as cp
>>> from cuml.dask.decomposition import TruncatedSVD
>>> from cuml.dask.datasets import make_blobs

>>> cluster = LocalCUDACluster(threads_per_worker=1)
>>> client = Client(cluster)

>>> nrows = 6
>>> ncols = 3
>>> n_parts = 2

>>> X_cudf, _ = make_blobs(n_samples=nrows, n_features=ncols,
...                        centers=1, n_parts=n_parts,
...                        cluster_std=1.8, random_state=10,
...                        dtype=cp.float32)
>>> in_blobs = X_cudf.compute()
>>> print(in_blobs)
[[ 6.953966    6.2313757   0.84974563]
[10.012338    3.4641726   3.0827546 ]
[ 9.537406    4.0504313   3.2793145 ]
[ 8.32713     2.957846    1.8215517 ]
[ 5.7044296   1.855514    3.7996366 ]
[10.089077    2.1995444   2.2072687 ]]
>>> cumlModel = TruncatedSVD(n_components = 1)
>>> XT = cumlModel.fit_transform(X_cudf)
>>> result = XT.compute()
>>> print(result)
[[ 8.699628   0.         0.       ]
[11.018815   0.         0.       ]
[10.8554535  0.         0.       ]
[ 9.000192   0.         0.       ]
[ 6.7628784  0.         0.       ]
[10.40526    0.         0.       ]]
>>> client.close()
>>> cluster.close()

fit(X, _transform=False)[source]#

Fit the model with X.

Parameters:

Xdask cuDF input

fit_transform(X)[source]#

Fit the model with X and apply the dimensionality reduction on X.

Parameters:

Xdask cuDF

Returns:

X_newdask cuDF

inverse_transform(X, delayed=True)[source]#

Transform data back to its original space.

In other words, return an input X_original whose transform would be X.

Parameters:

Xdask cuDF

Returns:

X_originaldask cuDF

transform(X, delayed=True)[source]#

Apply dimensionality reduction to X.

X is projected on the first principal components previously extracted from a training set.

Parameters:

Xdask cuDF

Returns:

X_newdask cuDF