Normalizer#
- class cuml.preprocessing.Normalizer(*args, **kwargs)[source]#
Normalize samples individually to unit norm.
Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1, l2 or inf) equals one.
This transformer is able to work both with dense numpy arrays and sparse matrix
Scaling inputs to unit norms is a common operation for text classification or clustering for instance. For instance the dot product of two l2-normalized TF-IDF vectors is the cosine similarity of the vectors and is the base similarity metric for the Vector Space Model commonly used by the Information Retrieval community.
- Parameters:
- norm‘l1’, ‘l2’, or ‘max’, optional (‘l2’ by default)
The norm to use to normalize each non zero sample. If norm=’max’ is used, values will be rescaled by the maximum of the absolute values.
- copyboolean, optional, default True
Whether a forced copy will be triggered. If copy=False, a copy might be triggered by a conversion.
Methods
fit(X[, y])Do nothing and return the estimator unchanged
transform(X[, copy])Scale each non zero row of X to unit norm
See also
normalizeEquivalent function without the estimator API.
Notes
This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.
Examples
>>> from cuml.preprocessing import Normalizer >>> import cupy as cp >>> X = [[4, 1, 2, 2], ... [1, 3, 9, 3], ... [5, 7, 5, 1]] >>> X = cp.array(X) >>> transformer = Normalizer().fit(X) # fit does nothing. >>> transformer Normalizer() >>> transformer.transform(X) array([[0.8, 0.2, 0.4, 0.4], [0.1, 0.3, 0.9, 0.3], [0.5, 0.7, 0.5, 0.1]])
- fit(X, y=None) Normalizer[source]#
Do nothing and return the estimator unchanged
This method is just there to implement the usual API and hence work in pipelines.
- Parameters:
- X{array-like, CSR matrix}
- transform(X, copy=None) SparseCumlArray[source]#
Scale each non zero row of X to unit norm
- Parameters:
- X{array-like, CSR matrix}, shape [n_samples, n_features]
The data to normalize, row by row.
- copybool, optional (default: None)
Whether a forced copy will be triggered. If copy=False, a copy might be triggered by a conversion.