LabelBinarizer#
- class cuml.dask.preprocessing.LabelBinarizer(*, client=None, **kwargs)[source]#
A distributed version of LabelBinarizer for one-hot encoding a collection of labels.
- Parameters:
- clientdask.distributed.Client, optional
Dask client to use
- **kwargsdict
Additional arguments passed to the underlying single-GPU LabelBinarizer
Methods
fit(y)Fit label binarizer
Fit the label encoder and return transformed labels
inverse_transform(y[, threshold])Invert a set of encoded labels back to original labels
transform(y)Transform and return encoded labels
Examples
Create an array with labels and dummy encode them
>>> import cupy as cp >>> import cupyx >>> from cuml.dask.preprocessing import LabelBinarizer >>> from dask_cuda import LocalCUDACluster >>> from dask.distributed import Client >>> import dask >>> cluster = LocalCUDACluster() >>> client = Client(cluster) >>> labels = cp.asarray([0, 5, 10, 7, 2, 4, 1, 0, 0, 4, 3, 2, 1], ... dtype=cp.int32) >>> labels = dask.array.from_array(labels) >>> lb = LabelBinarizer() >>> encoded = lb.fit_transform(labels) >>> print(encoded.compute()) [[1 0 0 0 0 0 0 0] [0 0 0 0 0 1 0 0] [0 0 0 0 0 0 0 1] [0 0 0 0 0 0 1 0] [0 0 1 0 0 0 0 0] [0 0 0 0 1 0 0 0] [0 1 0 0 0 0 0 0] [1 0 0 0 0 0 0 0] [1 0 0 0 0 0 0 0] [0 0 0 0 1 0 0 0] [0 0 0 1 0 0 0 0] [0 0 1 0 0 0 0 0] [0 1 0 0 0 0 0 0]] >>> decoded = lb.inverse_transform(encoded) >>> print(decoded.compute()) [ 0 5 10 7 2 4 1 0 0 4 3 2 1] >>> client.close() >>> cluster.close()
- fit(y)[source]#
Fit label binarizer
- Parameters:
- yDask.Array of shape [n_samples,] or [n_samples, n_classes]
chunked by row. Target values. The 2-d matrix should only contain 0 and 1, represents multilabel classification.
- Returns:
- selfreturns an instance of self.
- fit_transform(y)[source]#
Fit the label encoder and return transformed labels
- Parameters:
- yDask.Array of shape [n_samples,] or [n_samples, n_classes]
target values. The 2-d matrix should only contain 0 and 1, represents multilabel classification.
- Returns:
- arrDask.Array backed by CuPy arrays containing encoded labels
- inverse_transform(y, threshold=None)[source]#
Invert a set of encoded labels back to original labels
- Parameters:
- yDask.Array of shape [n_samples, n_classes] containing encoded
labels
- thresholdfloat This value is currently ignored
- Returns:
- arrDask.Array backed by CuPy arrays containing original labels