LedoitWolf#

class cuml.covariance.LedoitWolf(*, store_precision=True, assume_centered=False, block_size=1000, verbose=False, output_type=None)[source]#

LedoitWolf Estimator for covariance matrix estimation.

Computes the Ledoit-Wolf shrinkage estimator for the covariance matrix. This estimator regularizes the empirical covariance by shrinking it towards a scaled identity matrix, with the shrinkage coefficient determined by the Ledoit-Wolf formula.

The regularized covariance is: (1 - shrinkage) * cov + shrinkage * mu * np.identity(n_features)

where mu = trace(cov) / n_features and shrinkage is computed to minimize the Mean Squared Error between the regularized estimate and the true covariance.

Parameters:
store_precisionbool, default=True

Specifies if the estimated precision matrix is stored.

assume_centeredbool, default=False

If True, data will not be centered before computation. Useful when working with data whose mean is almost, but not exactly zero. If False (default), data will be centered before computation.

block_sizeint, default=1000

Size of blocks into which the covariance matrix will be split during its Ledoit-Wolf estimation. This is purely a memory optimization and does not affect results.

verboseint or boolean, default=False

Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None

Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.

Attributes:
covariance_ndarray of shape (n_features, n_features)

Estimated covariance matrix.

location_ndarray of shape (n_features,)

Estimated location, i.e., the estimated mean.

precision_ndarray of shape (n_features, n_features)

Estimated pseudo inverse matrix. Only stored if store_precision is True.

shrinkage_float

Coefficient in the convex combination used for the computation of the shrunk estimate. Range is [0, 1].

n_features_in_int

Number of features seen during fit.

Methods

error_norm(comp_cov[, norm, scaling, squared])

Compute the Mean Squared Error between two covariance estimators.

fit(X[, y, convert_dtype])

Fit the Ledoit-Wolf shrunk covariance model to X.

get_precision()

Getter for the precision matrix.

mahalanobis(X)

Compute the squared Mahalanobis distances of given observations.

score(X_test[, y])

Compute the log-likelihood of X_test under the estimated model.

See also

sklearn.covariance.LedoitWolf

The scikit-learn CPU implementation.

References

O. Ledoit and M. Wolf, “A Well-Conditioned Estimator for Large-Dimensional Covariance Matrices”, Journal of Multivariate Analysis, Volume 88, Issue 2, February 2004, pages 365-411.

Examples

>>> import cupy as cp
>>> from cuml.covariance import LedoitWolf
>>> rng = cp.random.RandomState(42)
>>> X = rng.randn(100, 5)
>>> lw = LedoitWolf().fit(X)
>>> lw.covariance_.shape
(5, 5)
>>> lw.shrinkage_
0.123...
error_norm(comp_cov, norm='frobenius', scaling=True, squared=True)[source]#

Compute the Mean Squared Error between two covariance estimators.

Parameters:
comp_covarray-like of shape (n_features, n_features)

The covariance to compare with.

norm{“frobenius”, “spectral”}, default=”frobenius”

The type of norm used to compute the error.

scalingbool, default=True

If True, the squared error is scaled by n_features.

squaredbool, default=True

If True, return squared error. If False, return error.

Returns:
errorfloat

The Mean Squared Error (in the sense of the Frobenius norm) between self and comp_cov.

fit(X, y=None, *, convert_dtype=True) LedoitWolf[source]#

Fit the Ledoit-Wolf shrunk covariance model to X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Training data, where n_samples is the number of samples and n_features is the number of features.

yIgnored

Not used, present for API consistency.

convert_dtypebool, default=True

If True, convert the input data to float32.

Returns:
selfLedoitWolf

Returns the instance itself.

get_precision()[source]#

Getter for the precision matrix.

Returns:
precision_ndarray of shape (n_features, n_features)

The precision matrix associated to the current covariance object.

mahalanobis(X)[source]#

Compute the squared Mahalanobis distances of given observations.

Parameters:
Xarray-like of shape (n_samples, n_features)

The observations, the Mahalanobis distances of which we compute.

Returns:
mahalanobis_distancesndarray of shape (n_samples,)

Squared Mahalanobis distances of the observations.

score(X_test, y=None) float[source]#

Compute the log-likelihood of X_test under the estimated model.

The log-likelihood is computed using the Gaussian model.

Parameters:
X_testarray-like of shape (n_samples, n_features)

Test data of which we compute the likelihood.

yIgnored

Not used, present for API consistency.

Returns:
log_likelihoodfloat

Log-likelihood of the data under the fitted Gaussian model.