LedoitWolf#

class cuml.covariance.LedoitWolf(*, store_precision=True, assume_centered=False, block_size=1000, verbose=False, output_type=None)[source]#

LedoitWolf Estimator for covariance matrix estimation.

Computes the Ledoit-Wolf shrinkage estimator for the covariance matrix. This estimator regularizes the empirical covariance by shrinking it towards a scaled identity matrix, with the shrinkage coefficient determined by the Ledoit-Wolf formula.

The regularized covariance is: (1 - shrinkage) * cov + shrinkage * mu * np.identity(n_features)

where mu = trace(cov) / n_features and shrinkage is computed to minimize the Mean Squared Error between the regularized estimate and the true covariance.

Parameters:

store_precisionbool, default=True: Specifies if the estimated precision matrix is stored.
assume_centeredbool, default=False: If True, data will not be centered before computation. Useful when working with data whose mean is almost, but not exactly zero. If False (default), data will be centered before computation.
block_sizeint, default=1000: Size of blocks into which the covariance matrix will be split during its Ledoit-Wolf estimation. This is purely a memory optimization and does not affect results.
verboseint or boolean, default=False: Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.
output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None: Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.

Attributes:

covariance_ndarray of shape (n_features, n_features): Estimated covariance matrix.
location_ndarray of shape (n_features,): Estimated location, i.e., the estimated mean.
precision_ndarray of shape (n_features, n_features): Estimated pseudo inverse matrix. Only stored if store_precision is True.
shrinkage_float: Coefficient in the convex combination used for the computation of the shrunk estimate. Range is [0, 1].
n_features_in_int: Number of features seen during fit.

Methods

`error_norm`(comp_cov[, norm, scaling, squared])	Compute the Mean Squared Error between two covariance estimators.
`fit`(X[, y, convert_dtype])	Fit the Ledoit-Wolf shrunk covariance model to X.
`get_precision`()	Getter for the precision matrix.
`mahalanobis`(X)	Compute the squared Mahalanobis distances of given observations.
`score`(X_test[, y])	Compute the log-likelihood of X_test under the estimated model.

See also

sklearn.covariance.LedoitWolf: The scikit-learn CPU implementation.

References

O. Ledoit and M. Wolf, “A Well-Conditioned Estimator for Large-Dimensional Covariance Matrices”, Journal of Multivariate Analysis, Volume 88, Issue 2, February 2004, pages 365-411.

Examples

>>> import cupy as cp
>>> from cuml.covariance import LedoitWolf
>>> rng = cp.random.RandomState(42)
>>> X = rng.randn(100, 5)
>>> lw = LedoitWolf().fit(X)
>>> lw.covariance_.shape
(5, 5)
>>> lw.shrinkage_
0.123...

error_norm(comp_cov, norm='frobenius', scaling=True, squared=True)[source]#

Compute the Mean Squared Error between two covariance estimators.

Parameters:

comp_covarray-like of shape (n_features, n_features): The covariance to compare with.
norm{“frobenius”, “spectral”}, default=”frobenius”: The type of norm used to compute the error.
scalingbool, default=True: If True, the squared error is scaled by n_features.
squaredbool, default=True: If True, return squared error. If False, return error.

Returns:

errorfloat: The Mean Squared Error (in the sense of the Frobenius norm) between self and comp_cov.

fit(X, y=None, *, convert_dtype=True) → LedoitWolf[source]#

Fit the Ledoit-Wolf shrunk covariance model to X.

Parameters:

Xarray-like of shape (n_samples, n_features): Training data, where n_samples is the number of samples and n_features is the number of features.
yIgnored: Not used, present for API consistency.
convert_dtypebool, default=True: If True, convert the input data to float32.

Returns:

selfLedoitWolf: Returns the instance itself.

get_precision()[source]#

Getter for the precision matrix.

Returns:

precision_ndarray of shape (n_features, n_features): The precision matrix associated to the current covariance object.

mahalanobis(X)[source]#

Compute the squared Mahalanobis distances of given observations.

Parameters:

Xarray-like of shape (n_samples, n_features): The observations, the Mahalanobis distances of which we compute.

Returns:

mahalanobis_distancesndarray of shape (n_samples,): Squared Mahalanobis distances of the observations.

score(X_test, y=None) → float[source]#

Compute the log-likelihood of X_test under the estimated model.

The log-likelihood is computed using the Gaussian model.

Parameters:

X_testarray-like of shape (n_samples, n_features): Test data of which we compute the likelihood.
yIgnored: Not used, present for API consistency.

Returns:

log_likelihoodfloat: Log-likelihood of the data under the fitted Gaussian model.