GaussianRandomProjection#
- class cuml.random_projection.GaussianRandomProjection(n_components='auto', *, eps=0.1, random_state=None, output_type=None, verbose=False)[source]#
Reduce dimensionality through Gaussian random projection.
The components of the random matrix are drawn from N(0, 1 / n_components).
- Parameters:
- n_componentsint or ‘auto’, default=’auto’
Dimensionality of the target projection space.
n_components can be automatically adjusted according to the number of samples in the dataset and the bound given by the Johnson-Lindenstrauss lemma. In that case the quality of the embedding is controlled by the
epsparameter.It should be noted that Johnson-Lindenstrauss lemma can yield very conservative estimated of the required number of components as it makes no assumption on the structure of the dataset.
- epsfloat, default=0.1
Parameter to control the quality of the embedding according to the Johnson-Lindenstrauss lemma when
n_componentsis set to ‘auto’. The value should be strictly positive.Smaller values lead to better embedding and higher number of dimensions (n_components) in the target projection space.
- random_stateint, RandomState instance or None, default=None
Controls the pseudo random number generator used to generate the projection matrix at fit time.
- output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None
Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (
cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.- verboseint or boolean, default=False
Sets logging level. It must be one of
cuml.common.logger.level_*. See Verbosity Levels for more info.
- Attributes:
- n_components_int
Concrete number of components computed when n_components=”auto”.
- components_array of shape (n_components, n_features)
Random matrix used for the projection.
- n_features_in_int
Number of features seen during fit.
Notes
Inspired by Scikit-learn’s implementation: https://scikit-learn.org/stable/modules/random_projection.html
Currently passing a sparse array to
transformmay result in close (but not exactly identical) results due to cupy/cupy#9323.Examples
>>> from cuml.random_projection import GaussianRandomProjection >>> from cuml.datasets import make_blobs >>> X, _ = make_blobs(n_samples=200, n_features=1000, random_state=42) >>> model = GaussianRandomProjection(n_components=50, random_state=42) >>> X_new = model.fit_transform(X) >>> X_new.shape (200, 50)