LinearRegression#

class cuml.linear_model.LinearRegression(*, algorithm='auto', fit_intercept=True, copy_X=True, verbose=False, output_type=None)#

Ordinary least squares Linear Regression.

Parameters:
algorithm{‘auto’, ‘eig’, ‘svd’, ‘lsmr’, ‘qr’, ‘svd-qr’, ‘svd-jacobi’}, default=’auto’

The algorithm to use when fitting:

  • ‘auto’: will select ‘eig’ if supported, falling back to ‘lsmr’ if X is sparse, and ‘svd’ otherwise.

  • ‘eig’: uses an eigendecomposition of the covariance matrix. It is faster than SVD, but potentially unstable. It doesn’t support multi-target y or sparse X.

  • ‘svd’ or ‘svd-jacobi’: uses an SVD decomposition. It’s slower, but stable. It doesn’t support sparse X.

  • ‘lsmr’: uses cupyx.scipy.sparse.linalg.lsmr, an iterative algorithm. It supports all input types and is typically very fast.

  • ‘qr’: uses QR decomposition and solves Rx = Q^T y. It’s faster than SVD, but doesn’t support multi-target y or sparse X.

  • ‘svd-qr’: computes SVD decomposition using QR algorithm. It’s the slowest option. It doesn’t support multi-target y or sparse X.

fit_interceptboolean (default = True)

If True, LinearRegression tries to correct for the global mean of y. If False, the model expects that you have centered the data.

copy_Xboolean, default=True

If True, X will never be mutated. Setting to False may reduce memory usage, at the cost of potentially mutating X.

verboseint or boolean, default=False

Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None

Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.

Attributes:
coef_array, shape (n_features,) or (n_targets, n_features)

The estimated coefficients for the linear regression model.

intercept_float or array, shape (n_targets,)

The independent term. If fit_intercept is False, will be 0. Will be an array when fit on multi-target y, otherwise will be a float.

Methods

fit(self, X, y[, sample_weight, convert_dtype])

Fit the model with X and y.

Notes

LinearRegression suffers from multicollinearity (when columns are correlated with each other), and variance explosions from outliers. Consider using Ridge to fix the multicollinearity problem, and consider maybe first DBSCAN to remove the outliers, or statistical analysis to filter possible outliers.

Applications of LinearRegression

LinearRegression is used in regression tasks where one wants to predict say sales or house prices. It is also used in extrapolation or time series tasks, dynamic systems modelling and many other machine learning tasks. This model should be first tried if the machine learning problem is a regression task (predicting a continuous variable).

For additional information, see scikit-learn’s documentation for sklearn.linear_model.LinearRegression.

For an additional example see the OLS notebook.

Examples

>>> import cupy as cp
>>> from cuml.linear_model import LinearRegression
>>> X = cp.array([[1, 1], [1, 2], [2, 2], [2, 3]], dtype=cp.float32)
>>> y = cp.array([6.0, 8.0, 9.0, 11.0], dtype=cp.float32)
>>> model = LinearRegression().fit(X, y)
>>> X_test = cp.array([[3, 5], [2, 5]], dtype=cp.float32)
>>> model.predict(X_test)
array([16.      , 14.999999], dtype=float32)
fit(self, X, y, sample_weight=None, *, convert_dtype=True) 'LinearRegression'[source]#

Fit the model with X and y.

Parameters:
Xarray-like (device or host) shape = (n_samples, n_features)

Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

yarray-like (device or host) shape = (n_samples, 1)

Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

sample_weightarray-like (device or host) shape = (n_samples,), default=None

The weights for each observation in X. If None, all observations are assigned equal weight. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

convert_dtypebool, optional (default = True)

When set to True, the train method will, when necessary, convert y to be the same data type as X if they differ. This will increase memory used for the method.