TfidfTransformer#
- class cuml.dask.feature_extraction.text.TfidfTransformer(*, client=None, verbose=False, **kwargs)[source]#
Distributed TF-IDF transformer
Methods
fit(X[, y])Fit distributed TFIDF Transformer
fit_transform(X[, y])Fit distributed TFIDFTransformer and then transform the given set of data samples.
transform(X[, y])Use distributed TFIDFTransformer to transform the given set of data samples.
Examples
>>> import cupy as cp >>> from sklearn.datasets import fetch_20newsgroups >>> from sklearn.feature_extraction.text import CountVectorizer >>> from dask_cuda import LocalCUDACluster >>> from dask.distributed import Client >>> from cuml.dask.common import to_sparse_dask_array >>> from cuml.dask.naive_bayes import MultinomialNB >>> import dask >>> from cuml.dask.feature_extraction.text import TfidfTransformer >>> # Create a local CUDA cluster >>> cluster = LocalCUDACluster() >>> client = Client(cluster) >>> # Load corpus >>> twenty_train = fetch_20newsgroups(subset='train', ... shuffle=True, random_state=42) >>> cv = CountVectorizer() >>> xformed = cv.fit_transform(twenty_train.data).astype(cp.float32) >>> X = to_sparse_dask_array(xformed, client) >>> y = dask.array.from_array(twenty_train.target, asarray=False, ... fancy=False).astype(cp.int32) >>> multi_gpu_transformer = TfidfTransformer() >>> X_transformed = multi_gpu_transformer.fit_transform(X) >>> X_transformed.compute_chunk_sizes() dask.array<...> >>> model = MultinomialNB() >>> model.fit(X_transformed, y) <cuml.dask.naive_bayes.naive_bayes.MultinomialNB object at 0x...> >>> result = model.score(X_transformed, y) >>> print(result) array(0.93264981) >>> client.close() >>> cluster.close()
- fit(X, y=None)[source]#
Fit distributed TFIDF Transformer
- Parameters:
- Xdask.Array with blocks containing dense or sparse cupy arrays
- Returns:
- cuml.dask.feature_extraction.text.TfidfTransformer instance