LabelEncoder#
- class cuml.preprocessing.LabelEncoder(*, handle_unknown='error', verbose=False, output_type=None)[source]#
Encode target labels with values between 0 and n_classes - 1.
This transformer should be used to encode target values (
y) and not the inputX.- Parameters:
- handle_unknown{‘error’, ‘ignore’}, default=’error’
Whether to raise an error or ignore if an unknown categorical feature is present during transform (default is to raise). When this parameter is set to ‘ignore’ and an unknown category is encountered during transform or inverse transform, the resulting encoding will be null.
- verboseint or boolean, default=False
Sets logging level. It must be one of
cuml.common.logger.level_*. See Verbosity Levels for more info.- output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None
Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (
cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.
- Attributes:
- classes_numpy.ndarray of shape (n_classes,)
Holds the label for each class.
Methods
fit(y)Fit a LabelEncoder instance to a set of categories.
Simultaneously fit and transform an input.
Transform labels back to original encoding.
transform(y)Transform an input into its categorical keys.
Examples
>>> import numpy as np >>> from cuml.preprocessing import LabelEncoder >>> y = np.array(["apple", "apple", "banana", "grape"]) >>> le = LabelEncoder() >>> le.fit_transform(y) array([0, 0, 1, 2], dtype=uint8) >>> le.classes_ array(['apple', 'banana', 'grape'], dtype='<U6')
- fit(y)[source]#
Fit a LabelEncoder instance to a set of categories.
- Parameters:
- yarray-like (device or host) shape = n_samples
Dense matrix of any dtype. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- Returns:
- selfLabelEncoder
Fitted label encoder.
- fit_transform(y)[source]#
Simultaneously fit and transform an input.
This is functionally equivalent to (but faster than)
LabelEncoder().fit(y).transform(y).- Parameters:
- yarray-like (device or host) shape = n_samples
Dense matrix of any dtype. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- Returns:
- ycuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = n_samples
Encoded labels.
For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.
- inverse_transform(y)[source]#
Transform labels back to original encoding.
- Parameters:
- yarray-like (device or host) shape = n_samples
Dense matrix of any dtype. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- Returns:
- y_originalcuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = n_samples
Original encoding.
For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.
- transform(y)[source]#
Transform an input into its categorical keys.
This is intended for use with small inputs relative to the size of the dataset. For fitting and transforming an entire dataset, prefer
fit_transform.- Parameters:
- yarray-like (device or host) shape = n_samples
Dense matrix of any dtype. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- Returns:
- ycuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = n_samples
Encoded labels.
For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.