Normalizer#

class cuml.preprocessing.Normalizer(*args, **kwargs)[source]#

Normalize samples individually to unit norm.

Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1, l2 or inf) equals one.

This transformer is able to work both with dense numpy arrays and sparse matrix

Scaling inputs to unit norms is a common operation for text classification or clustering for instance. For instance the dot product of two l2-normalized TF-IDF vectors is the cosine similarity of the vectors and is the base similarity metric for the Vector Space Model commonly used by the Information Retrieval community.

Parameters:
norm‘l1’, ‘l2’, or ‘max’, optional (‘l2’ by default)

The norm to use to normalize each non zero sample. If norm=’max’ is used, values will be rescaled by the maximum of the absolute values.

copyboolean, optional, default True

Whether a forced copy will be triggered. If copy=False, a copy might be triggered by a conversion.

Methods

fit(X[, y])

Do nothing and return the estimator unchanged

transform(X[, copy])

Scale each non zero row of X to unit norm

See also

normalize

Equivalent function without the estimator API.

Notes

This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.

Examples

>>> from cuml.preprocessing import Normalizer
>>> import cupy as cp
>>> X = [[4, 1, 2, 2],
...      [1, 3, 9, 3],
...      [5, 7, 5, 1]]
>>> X = cp.array(X)
>>> transformer = Normalizer().fit(X)  # fit does nothing.
>>> transformer
Normalizer()
>>> transformer.transform(X)
array([[0.8, 0.2, 0.4, 0.4],
       [0.1, 0.3, 0.9, 0.3],
       [0.5, 0.7, 0.5, 0.1]])
fit(X, y=None) Normalizer[source]#

Do nothing and return the estimator unchanged

This method is just there to implement the usual API and hence work in pipelines.

Parameters:
X{array-like, CSR matrix}
transform(X, copy=None) SparseCumlArray[source]#

Scale each non zero row of X to unit norm

Parameters:
X{array-like, CSR matrix}, shape [n_samples, n_features]

The data to normalize, row by row.

copybool, optional (default: None)

Whether a forced copy will be triggered. If copy=False, a copy might be triggered by a conversion.