Ridge#

class cuml.linear_model.Ridge(alpha=1.0, *, fit_intercept=True, solver='auto', tol=0.0001, max_iter=None, copy_X=True, output_type=None, verbose=False)#

Linear least squares with L2 regularization.

Ridge extends LinearRegression by providing L2 regularization on the coefficients when predicting response y with a linear combination of the predictors in X. It can reduce the variance of the predictors, and improves the conditioning of the problem.

Parameters:
alphafloat or array of shape (n_targets,), default=1.0

Regularization strength - must be a positive float. Larger values specify stronger regularization.

fit_interceptbool, default=True

If True, Ridge tries to correct for the global mean of y. If False, the model expects that you have centered the data.

solver{‘auto’, ‘eig’, ‘svd’, ‘lsmr’}, default=’auto’

The solver to use when fitting:

  • ‘auto’: will select ‘eig’ if supported, falling back to ‘lsmr’ if X is sparse, and ‘svd’ otherwise.

  • ‘eig’: uses an eigendecomposition of the covariance matrix. It is faster than SVD, but potentially unstable. It doesn’t support multi-target y or sparse X.

  • ‘svd’: uses an SVD decomposition. It’s slower, but stable. It doesn’t support sparse X.

  • ‘lsmr’: uses cupyx.scipy.sparse.linalg.lsmr, an iterative algorithm. It is typically the fastest, and supports all options.

tolfloat, default=1e-4

The tolerance used by the lsmr solver. Has no impact on other solvers.

max_iterint, default=None

Maximum number of iterations for the lsmr solver. Defaults to None for no limit. Has no impact on other solvers.

copy_X: bool, default=True

If True, X will never be mutated. Setting to False may reduce memory usage, at the cost of potentially mutating X.

output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None

Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.

verboseint or boolean, default=False

Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

Attributes:
coef_array, shape (n_features,)

The estimated coefficients for the linear regression model.

intercept_float or array, shape (n_targets,)

The independent term. If fit_intercept is False, will be 0. Will be an array when fit on multi-target y, otherwise will be a float.

solver_str

The solver that was used at fit time.

n_iter_numpy.ndarray or None, shape (n_targets,)

The number of iterations the solver ran per-target if the 'lsmr' solver was used, or None for other solvers.

Methods

fit(self, X, y[, sample_weight, convert_dtype])

Fit the model with X and y.

Notes

For additional docs, see Scikit-learn’s Ridge Regression.

Examples

>>> import cupy as cp
>>> import cudf
>>> from cuml import Ridge
>>> X = cudf.DataFrame()
>>> X['col1'] = cp.array([1,1,2,2], dtype = cp.float32)
>>> X['col2'] = cp.array([1,2,2,3], dtype = cp.float32)
>>> y = cudf.Series(cp.array([6.0, 8.0, 9.0, 11.0], dtype=cp.float32))
>>> ridge = Ridge(alpha=1e-5).fit(X, y)
>>> print(ridge.coef_)
0 1.000...
1 1.999...
>>> print(ridge.intercept_)
3.0...
>>> X_new = cudf.DataFrame()
>>> X_new['col1'] = cp.array([3,2], dtype=cp.float32)
>>> X_new['col2'] = cp.array([5,5], dtype=cp.float32)
>>> preds = ridge.predict(X_new)
>>> print(preds)
0 15.999...
1 14.999...
fit(self, X, y, sample_weight=None, *, convert_dtype=True) 'Ridge'[source]#

Fit the model with X and y.

Parameters:
Xarray-like (device or host) shape = (n_samples, n_features)

Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

yarray-like (device or host) shape = (n_samples, 1)

Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

sample_weightarray-like (device or host) shape = (n_samples,), default=None

The weights for each observation in X. If None, all observations are assigned equal weight. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

convert_dtypebool, optional (default = True)

When set to True, the train method will, when necessary, convert y to be the same data type as X if they differ. This will increase memory used for the method.