LabelBinarizer#

class cuml.dask.preprocessing.LabelBinarizer(*, client=None, **kwargs)[source]#

A distributed version of LabelBinarizer for one-hot encoding a collection of labels.

Parameters:
clientdask.distributed.Client, optional

Dask client to use

**kwargsdict

Additional arguments passed to the underlying single-GPU LabelBinarizer

Methods

fit(y)

Fit label binarizer

fit_transform(y)

Fit the label encoder and return transformed labels

inverse_transform(y[, threshold])

Invert a set of encoded labels back to original labels

transform(y)

Transform and return encoded labels

Examples

Create an array with labels and dummy encode them

>>> import cupy as cp
>>> import cupyx
>>> from cuml.dask.preprocessing import LabelBinarizer

>>> from dask_cuda import LocalCUDACluster
>>> from dask.distributed import Client
>>> import dask

>>> cluster = LocalCUDACluster()
>>> client = Client(cluster)

>>> labels = cp.asarray([0, 5, 10, 7, 2, 4, 1, 0, 0, 4, 3, 2, 1],
...                     dtype=cp.int32)
>>> labels = dask.array.from_array(labels)

>>> lb = LabelBinarizer()
>>> encoded = lb.fit_transform(labels)
>>> print(encoded.compute())
[[1 0 0 0 0 0 0 0]
[0 0 0 0 0 1 0 0]
[0 0 0 0 0 0 0 1]
[0 0 0 0 0 0 1 0]
[0 0 1 0 0 0 0 0]
[0 0 0 0 1 0 0 0]
[0 1 0 0 0 0 0 0]
[1 0 0 0 0 0 0 0]
[1 0 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0]
[0 0 0 1 0 0 0 0]
[0 0 1 0 0 0 0 0]
[0 1 0 0 0 0 0 0]]
>>> decoded = lb.inverse_transform(encoded)
>>> print(decoded.compute())
[ 0  5 10  7  2  4  1  0  0  4  3  2  1]
>>> client.close()
>>> cluster.close()
fit(y)[source]#

Fit label binarizer

Parameters:
yDask.Array of shape [n_samples,] or [n_samples, n_classes]

chunked by row. Target values. The 2-d matrix should only contain 0 and 1, represents multilabel classification.

Returns:
selfreturns an instance of self.
fit_transform(y)[source]#

Fit the label encoder and return transformed labels

Parameters:
yDask.Array of shape [n_samples,] or [n_samples, n_classes]

target values. The 2-d matrix should only contain 0 and 1, represents multilabel classification.

Returns:
arrDask.Array backed by CuPy arrays containing encoded labels
inverse_transform(y, threshold=None)[source]#

Invert a set of encoded labels back to original labels

Parameters:
yDask.Array of shape [n_samples, n_classes] containing encoded

labels

thresholdfloat This value is currently ignored
Returns:
arrDask.Array backed by CuPy arrays containing original labels
transform(y)[source]#

Transform and return encoded labels

Parameters:
yDask.Array of shape [n_samples,] or [n_samples, n_classes]
Returns:
arrDask.Array backed by CuPy arrays containing encoded labels