TfidfTransformer#

class cuml.dask.feature_extraction.text.TfidfTransformer(*, client=None, verbose=False, **kwargs)[source]#

Distributed TF-IDF transformer

Methods

fit(X[, y])

Fit distributed TFIDF Transformer

fit_transform(X[, y])

Fit distributed TFIDFTransformer and then transform the given set of data samples.

transform(X[, y])

Use distributed TFIDFTransformer to transform the given set of data samples.

Examples

>>> import cupy as cp
>>> from sklearn.datasets import fetch_20newsgroups
>>> from sklearn.feature_extraction.text import CountVectorizer
>>> from dask_cuda import LocalCUDACluster
>>> from dask.distributed import Client
>>> from cuml.dask.common import to_sparse_dask_array
>>> from cuml.dask.naive_bayes import MultinomialNB
>>> import dask
>>> from cuml.dask.feature_extraction.text import TfidfTransformer

>>> # Create a local CUDA cluster
>>> cluster = LocalCUDACluster()
>>> client = Client(cluster)

>>> # Load corpus
>>> twenty_train = fetch_20newsgroups(subset='train',
...                         shuffle=True, random_state=42)
>>> cv = CountVectorizer()
>>> xformed = cv.fit_transform(twenty_train.data).astype(cp.float32)
>>> X = to_sparse_dask_array(xformed, client)

>>> y = dask.array.from_array(twenty_train.target, asarray=False,
...                     fancy=False).astype(cp.int32)

>>> multi_gpu_transformer = TfidfTransformer()
>>> X_transformed = multi_gpu_transformer.fit_transform(X)
>>> X_transformed.compute_chunk_sizes()
dask.array<...>

>>> model = MultinomialNB()
>>> model.fit(X_transformed, y)
<cuml.dask.naive_bayes.naive_bayes.MultinomialNB object at 0x...>
>>> result = model.score(X_transformed, y)
>>> print(result)
array(0.93264981)
>>> client.close()
>>> cluster.close()
fit(X, y=None)[source]#

Fit distributed TFIDF Transformer

Parameters:
Xdask.Array with blocks containing dense or sparse cupy arrays
Returns:
cuml.dask.feature_extraction.text.TfidfTransformer instance
fit_transform(X, y=None)[source]#

Fit distributed TFIDFTransformer and then transform the given set of data samples.

Parameters:
Xdask.Array with blocks containing dense or sparse cupy arrays
Returns:
dask.Array with blocks containing transformed sparse cupy arrays
transform(X, y=None)[source]#

Use distributed TFIDFTransformer to transform the given set of data samples.

Parameters:
Xdask.Array with blocks containing dense or sparse cupy arrays
Returns:
dask.Array with blocks containing transformed sparse cupy arrays