LinearRegression#

class cuml.dask.linear_model.LinearRegression(*, client=None, verbose=False, **kwargs)[source]#

LinearRegression is a simple machine learning model where the response y is modelled by a linear combination of the predictors in X.

cuML’s Dask Linear Regression (multi-node multi-GPU) expects Dask cuDF DataFrame and provides an eigendecomposition-based algorithm (Eig) to fit a linear model. The Eig algorithm is usually preferred when X is a tall and skinny matrix. As the number of features in X increases, the accuracy of the Eig algorithm may decrease.

Parameters:
algorithm‘eig’

Eig uses an eigendecomposition of the covariance matrix, and is much faster. SVD is slower, but guaranteed to be stable.

fit_interceptboolean (default = True)

LinearRegression adds an additional term c to correct for the global mean of y, modeling the response as “x * beta + c”. If False, the model expects that you have centered the data.

Attributes:
coef_cuDF series, shape (n_features)

The estimated coefficients for the linear regression model.

intercept_array

The independent term. If fit_intercept is False, will be 0.

Methods

fit(X, y)

Fit the model with X and y.

predict(X[, delayed])

Make predictions for X and returns a dask collection.

fit(X, y)[source]#

Fit the model with X and y.

Parameters:
XDask cuDF DataFrame or CuPy backed Dask Array (n_rows, n_features)

Features for regression

yDask cuDF DataFrame or CuPy backed Dask Array (n_rows, 1)

Labels (outcome values)

predict(X, delayed=True)[source]#

Make predictions for X and returns a dask collection.

Parameters:
XDask cuDF DataFrame or CuPy backed Dask Array (n_rows, n_features)

Distributed dense matrix (floats or doubles) of shape (n_samples, n_features).

delayedbool (default = True)

Whether to do a lazy prediction (and return Delayed objects) or an eagerly executed one.

Returns:
yDask cuDF DataFrame or CuPy backed Dask Array (n_rows, 1)