LabelEncoder#

class cuml.preprocessing.LabelEncoder(*, handle_unknown='error', verbose=False, output_type=None)[source]#

Encode target labels with values between 0 and n_classes - 1.

This transformer should be used to encode target values (y) and not the input X.

Parameters:
handle_unknown{‘error’, ‘ignore’}, default=’error’

Whether to raise an error or ignore if an unknown categorical feature is present during transform (default is to raise). When this parameter is set to ‘ignore’ and an unknown category is encountered during transform or inverse transform, the resulting encoding will be null.

verboseint or boolean, default=False

Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None

Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.

Attributes:
classes_numpy.ndarray of shape (n_classes,)

Holds the label for each class.

Methods

fit(y)

Fit a LabelEncoder instance to a set of categories.

fit_transform(y)

Simultaneously fit and transform an input.

inverse_transform(y)

Transform labels back to original encoding.

transform(y)

Transform an input into its categorical keys.

Examples

>>> import numpy as np
>>> from cuml.preprocessing import LabelEncoder
>>> y = np.array(["apple", "apple", "banana", "grape"])
>>> le = LabelEncoder()
>>> le.fit_transform(y)
array([0, 0, 1, 2], dtype=uint8)
>>> le.classes_
array(['apple', 'banana', 'grape'], dtype='<U6')
fit(y)[source]#

Fit a LabelEncoder instance to a set of categories.

Parameters:
yarray-like (device or host) shape = n_samples

Dense matrix of any dtype. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

Returns:
selfLabelEncoder

Fitted label encoder.

fit_transform(y)[source]#

Simultaneously fit and transform an input.

This is functionally equivalent to (but faster than) LabelEncoder().fit(y).transform(y).

Parameters:
yarray-like (device or host) shape = n_samples

Dense matrix of any dtype. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

Returns:
ycuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = n_samples

Encoded labels.

For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.

inverse_transform(y)[source]#

Transform labels back to original encoding.

Parameters:
yarray-like (device or host) shape = n_samples

Dense matrix of any dtype. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

Returns:
y_originalcuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = n_samples

Original encoding.

For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.

transform(y)[source]#

Transform an input into its categorical keys.

This is intended for use with small inputs relative to the size of the dataset. For fitting and transforming an entire dataset, prefer fit_transform.

Parameters:
yarray-like (device or host) shape = n_samples

Dense matrix of any dtype. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

Returns:
ycuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = n_samples

Encoded labels.

For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.