LabelBinarizer#

class cuml.dask.preprocessing.LabelBinarizer(*, client=None, **kwargs)[source]#

A distributed version of LabelBinarizer for one-hot encoding a collection of labels.

Parameters:

clientdask.distributed.Client, optional: Dask client to use
**kwargsdict: Additional arguments passed to the underlying single-GPU LabelBinarizer

Methods

`fit`(y)	Fit label binarizer
`fit_transform`(y)	Fit the label encoder and return transformed labels
`inverse_transform`(y[, threshold])	Invert a set of encoded labels back to original labels
`transform`(y)	Transform and return encoded labels

Examples

Create an array with labels and dummy encode them

>>> import cupy as cp
>>> import cupyx
>>> from cuml.dask.preprocessing import LabelBinarizer

>>> from dask_cuda import LocalCUDACluster
>>> from dask.distributed import Client
>>> import dask

>>> cluster = LocalCUDACluster()
>>> client = Client(cluster)

>>> labels = cp.asarray([0, 5, 10, 7, 2, 4, 1, 0, 0, 4, 3, 2, 1],
...                     dtype=cp.int32)
>>> labels = dask.array.from_array(labels)

>>> lb = LabelBinarizer()
>>> encoded = lb.fit_transform(labels)
>>> print(encoded.compute())
[[1 0 0 0 0 0 0 0]
[0 0 0 0 0 1 0 0]
[0 0 0 0 0 0 0 1]
[0 0 0 0 0 0 1 0]
[0 0 1 0 0 0 0 0]
[0 0 0 0 1 0 0 0]
[0 1 0 0 0 0 0 0]
[1 0 0 0 0 0 0 0]
[1 0 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0]
[0 0 0 1 0 0 0 0]
[0 0 1 0 0 0 0 0]
[0 1 0 0 0 0 0 0]]
>>> decoded = lb.inverse_transform(encoded)
>>> print(decoded.compute())
[ 0  5 10  7  2  4  1  0  0  4  3  2  1]
>>> client.close()
>>> cluster.close()

fit(y)[source]#

Fit label binarizer

Parameters:

yDask.Array of shape [n_samples,] or [n_samples, n_classes]: chunked by row. Target values. The 2-d matrix should only contain 0 and 1, represents multilabel classification.

Returns:

selfreturns an instance of self.

fit_transform(y)[source]#

Fit the label encoder and return transformed labels

Parameters:

yDask.Array of shape [n_samples,] or [n_samples, n_classes]: target values. The 2-d matrix should only contain 0 and 1, represents multilabel classification.

Returns:

arrDask.Array backed by CuPy arrays containing encoded labels

inverse_transform(y, threshold=None)[source]#

Invert a set of encoded labels back to original labels

Parameters:

yDask.Array of shape [n_samples, n_classes] containing encoded: labels
thresholdfloat This value is currently ignored

Returns:

arrDask.Array backed by CuPy arrays containing original labels

transform(y)[source]#

Transform and return encoded labels

Parameters:

yDask.Array of shape [n_samples,] or [n_samples, n_classes]

Returns:

arrDask.Array backed by CuPy arrays containing encoded labels