GaussianRandomProjection#

class cuml.random_projection.GaussianRandomProjection(n_components='auto', *, eps=0.1, random_state=None, output_type=None, verbose=False)[source]#

Reduce dimensionality through Gaussian random projection.

The components of the random matrix are drawn from N(0, 1 / n_components).

Parameters:
n_componentsint or ‘auto’, default=’auto’

Dimensionality of the target projection space.

n_components can be automatically adjusted according to the number of samples in the dataset and the bound given by the Johnson-Lindenstrauss lemma. In that case the quality of the embedding is controlled by the eps parameter.

It should be noted that Johnson-Lindenstrauss lemma can yield very conservative estimated of the required number of components as it makes no assumption on the structure of the dataset.

epsfloat, default=0.1

Parameter to control the quality of the embedding according to the Johnson-Lindenstrauss lemma when n_components is set to ‘auto’. The value should be strictly positive.

Smaller values lead to better embedding and higher number of dimensions (n_components) in the target projection space.

random_stateint, RandomState instance or None, default=None

Controls the pseudo random number generator used to generate the projection matrix at fit time.

output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None

Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.

verboseint or boolean, default=False

Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

Attributes:
n_components_int

Concrete number of components computed when n_components=”auto”.

components_array of shape (n_components, n_features)

Random matrix used for the projection.

n_features_in_int

Number of features seen during fit.

Notes

Inspired by Scikit-learn’s implementation: https://scikit-learn.org/stable/modules/random_projection.html

Currently passing a sparse array to transform may result in close (but not exactly identical) results due to cupy/cupy#9323.

Examples

>>> from cuml.random_projection import GaussianRandomProjection
>>> from cuml.datasets import make_blobs
>>> X, _ = make_blobs(n_samples=200, n_features=1000, random_state=42)
>>> model = GaussianRandomProjection(n_components=50, random_state=42)
>>> X_new = model.fit_transform(X)
>>> X_new.shape
(200, 50)