scale#

cuml.preprocessing.scale(X, *, axis=0, with_mean=True, with_std=True, copy=True)[source]#

Standardize a dataset along any axis

Center to the mean and component wise scale to unit variance.

Parameters:
X{array-like, sparse matrix}

The data to center and scale.

axisint (0 by default)

axis used to compute the means and standard deviations along. If 0, independently standardize each feature, otherwise (if 1) standardize each sample.

with_meanboolean, True by default

If True, center the data before scaling.

with_stdboolean, True by default

If True, scale the data to unit variance (or equivalently, unit standard deviation).

copyboolean, optional, default True

Whether a forced copy will be triggered. If copy=False, a copy might be triggered by a conversion.

See also

StandardScaler

Performs scaling to unit variance using the``Transformer`` API

Notes

This implementation will refuse to center sparse matrices since it would make them non-sparse and would potentially crash the program with memory exhaustion problems.

Instead the caller is expected to either set explicitly with_mean=False (in that case, only variance scaling will be performed on the features of the sparse matrix) or to densify the matrix if he/she expects the materialized dense array to fit in memory.

For optimal processing the caller should pass a CSC matrix.

NaNs are treated as missing values: disregarded to compute the statistics, and maintained during the data transformation.

We use a biased estimator for the standard deviation, equivalent to numpy.std(x, ddof=0). Note that the choice of ddof is unlikely to affect model performance.