cuml.dask#

Multi-node, multi-GPU algorithms using Dask.

Cluster#

DBSCAN

Multi-Node Multi-GPU implementation of DBSCAN.

KMeans

Multi-Node Multi-GPU implementation of KMeans.

Decomposition#

PCA

PCA (Principal Component Analysis) is a fundamental dimensionality reduction technique used to combine features in X in linear combinations such that each new component captures the most information or variance of the data.

TruncatedSVD

Ensemble#

RandomForestClassifier

Multi-GPU Random Forest classifier model which fits multiple decision tree classifiers in an ensemble.

RandomForestRegressor

Multi-GPU Random Forest regressor model which fits multiple decision tree regressors in an ensemble.

Linear Models#

LinearRegression

LinearRegression is a simple machine learning model where the response y is modelled by a linear combination of the predictors in X.

Ridge

Ridge extends LinearRegression by providing L2 regularization on the coefficients when predicting response y with a linear combination of the predictors in X.

Lasso

Lasso extends LinearRegression by providing L1 regularization on the coefficients when predicting response y with a linear combination of the predictors in X.

ElasticNet

ElasticNet extends LinearRegression with combined L1 and L2 regularizations on the coefficients when predicting response y with a linear combination of the predictors in X.

Manifold#

UMAP

Uniform Manifold Approximation and Projection

Naive Bayes#

MultinomialNB

Distributed Naive Bayes classifier for multinomial models

Neighbors#

NearestNeighbors

Multi-node Multi-GPU NearestNeighbors Model.

KNeighborsClassifier

Multi-node Multi-GPU K-Nearest Neighbors Classifier Model.

KNeighborsRegressor

Multi-node Multi-GPU K-Nearest Neighbors Regressor Model.

Preprocessing#

LabelBinarizer

A distributed version of LabelBinarizer for one-hot encoding a collection of labels.

OneHotEncoder

Encode categorical features as a one-hot numeric array.

Feature Extraction#

TfidfTransformer

Distributed TF-IDF transformer

Datasets#

make_blobs

Makes labeled Dask-Cupy arrays containing blobs for a randomly generated set of centroids.

make_classification

Generate a random n-class classification problem.

make_regression

Generate a random regression problem.

Solvers#

CD

Multi-node Multi-GPU Coordinate Descent Solver.

Base Classes and Mixins#