LinearRegression#
- class cuml.linear_model.LinearRegression(*, algorithm='auto', fit_intercept=True, copy_X=True, verbose=False, output_type=None)#
Ordinary least squares Linear Regression.
- Parameters:
- algorithm{‘auto’, ‘eig’, ‘svd’, ‘lsmr’, ‘qr’, ‘svd-qr’, ‘svd-jacobi’}, default=’auto’
The algorithm to use when fitting:
‘auto’: will select ‘eig’ if supported, falling back to ‘lsmr’ if X is sparse, and ‘svd’ otherwise.
‘eig’: uses an eigendecomposition of the covariance matrix. It is faster than SVD, but potentially unstable. It doesn’t support multi-target
yor sparseX.‘svd’ or ‘svd-jacobi’: uses an SVD decomposition. It’s slower, but stable. It doesn’t support sparse
X.‘lsmr’: uses
cupyx.scipy.sparse.linalg.lsmr, an iterative algorithm. It supports all input types and is typically very fast.‘qr’: uses QR decomposition and solves
Rx = Q^T y. It’s faster than SVD, but doesn’t support multi-targetyor sparseX.‘svd-qr’: computes SVD decomposition using QR algorithm. It’s the slowest option. It doesn’t support multi-target
yor sparseX.
- fit_interceptboolean (default = True)
If True, LinearRegression tries to correct for the global mean of y. If False, the model expects that you have centered the data.
- copy_Xboolean, default=True
If True, X will never be mutated. Setting to False may reduce memory usage, at the cost of potentially mutating X.
- verboseint or boolean, default=False
Sets logging level. It must be one of
cuml.common.logger.level_*. See Verbosity Levels for more info.- output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None
Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (
cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.
- Attributes:
- coef_array, shape (n_features,) or (n_targets, n_features)
The estimated coefficients for the linear regression model.
- intercept_float or array, shape (n_targets,)
The independent term. If
fit_interceptis False, will be 0. Will be an array when fit on multi-target y, otherwise will be a float.
Methods
fit(self, X, y[, sample_weight, convert_dtype])Fit the model with X and y.
Notes
LinearRegression suffers from multicollinearity (when columns are correlated with each other), and variance explosions from outliers. Consider using
Ridgeto fix the multicollinearity problem, and consider maybe firstDBSCANto remove the outliers, or statistical analysis to filter possible outliers.Applications of LinearRegression
LinearRegression is used in regression tasks where one wants to predict say sales or house prices. It is also used in extrapolation or time series tasks, dynamic systems modelling and many other machine learning tasks. This model should be first tried if the machine learning problem is a regression task (predicting a continuous variable).
For additional information, see scikit-learn’s documentation for
sklearn.linear_model.LinearRegression.For an additional example see the OLS notebook.
Examples
>>> import cupy as cp >>> from cuml.linear_model import LinearRegression >>> X = cp.array([[1, 1], [1, 2], [2, 2], [2, 3]], dtype=cp.float32) >>> y = cp.array([6.0, 8.0, 9.0, 11.0], dtype=cp.float32) >>> model = LinearRegression().fit(X, y)
>>> X_test = cp.array([[3, 5], [2, 5]], dtype=cp.float32) >>> model.predict(X_test) array([16. , 14.999999], dtype=float32)
- fit(self, X, y, sample_weight=None, *, convert_dtype=True) 'LinearRegression'[source]#
Fit the model with X and y.
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- yarray-like (device or host) shape = (n_samples, 1)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- sample_weightarray-like (device or host) shape = (n_samples,), default=None
The weights for each observation in X. If None, all observations are assigned equal weight. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the train method will, when necessary, convert y to be the same data type as X if they differ. This will increase memory used for the method.