Ridge#
- class cuml.linear_model.Ridge(alpha=1.0, *, fit_intercept=True, solver='auto', tol=0.0001, max_iter=None, copy_X=True, output_type=None, verbose=False)#
Linear least squares with L2 regularization.
Ridge extends LinearRegression by providing L2 regularization on the coefficients when predicting response y with a linear combination of the predictors in X. It can reduce the variance of the predictors, and improves the conditioning of the problem.
- Parameters:
- alphafloat or array of shape (n_targets,), default=1.0
Regularization strength - must be a positive float. Larger values specify stronger regularization.
- fit_interceptbool, default=True
If True, Ridge tries to correct for the global mean of y. If False, the model expects that you have centered the data.
- solver{‘auto’, ‘eig’, ‘svd’, ‘lsmr’}, default=’auto’
The solver to use when fitting:
‘auto’: will select ‘eig’ if supported, falling back to ‘lsmr’ if X is sparse, and ‘svd’ otherwise.
‘eig’: uses an eigendecomposition of the covariance matrix. It is faster than SVD, but potentially unstable. It doesn’t support multi-target
yor sparseX.‘svd’: uses an SVD decomposition. It’s slower, but stable. It doesn’t support sparse
X.‘lsmr’: uses
cupyx.scipy.sparse.linalg.lsmr, an iterative algorithm. It is typically the fastest, and supports all options.
- tolfloat, default=1e-4
The tolerance used by the
lsmrsolver. Has no impact on other solvers.- max_iterint, default=None
Maximum number of iterations for the
lsmrsolver. Defaults toNonefor no limit. Has no impact on other solvers.- copy_X: bool, default=True
If True, X will never be mutated. Setting to False may reduce memory usage, at the cost of potentially mutating X.
- output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None
Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (
cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.- verboseint or boolean, default=False
Sets logging level. It must be one of
cuml.common.logger.level_*. See Verbosity Levels for more info.
- Attributes:
- coef_array, shape (n_features,)
The estimated coefficients for the linear regression model.
- intercept_float or array, shape (n_targets,)
The independent term. If
fit_interceptis False, will be 0. Will be an array when fit on multi-target y, otherwise will be a float.- solver_str
The solver that was used at fit time.
- n_iter_numpy.ndarray or None, shape (n_targets,)
The number of iterations the solver ran per-target if the
'lsmr'solver was used, orNonefor other solvers.
Methods
fit(self, X, y[, sample_weight, convert_dtype])Fit the model with X and y.
Notes
For additional docs, see Scikit-learn’s Ridge Regression.
Examples
>>> import cupy as cp >>> import cudf >>> from cuml import Ridge
>>> X = cudf.DataFrame() >>> X['col1'] = cp.array([1,1,2,2], dtype = cp.float32) >>> X['col2'] = cp.array([1,2,2,3], dtype = cp.float32) >>> y = cudf.Series(cp.array([6.0, 8.0, 9.0, 11.0], dtype=cp.float32))
>>> ridge = Ridge(alpha=1e-5).fit(X, y) >>> print(ridge.coef_) 0 1.000... 1 1.999... >>> print(ridge.intercept_) 3.0... >>> X_new = cudf.DataFrame() >>> X_new['col1'] = cp.array([3,2], dtype=cp.float32) >>> X_new['col2'] = cp.array([5,5], dtype=cp.float32) >>> preds = ridge.predict(X_new) >>> print(preds) 0 15.999... 1 14.999...
- fit(self, X, y, sample_weight=None, *, convert_dtype=True) 'Ridge'[source]#
Fit the model with X and y.
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- yarray-like (device or host) shape = (n_samples, 1)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- sample_weightarray-like (device or host) shape = (n_samples,), default=None
The weights for each observation in X. If None, all observations are assigned equal weight. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the train method will, when necessary, convert y to be the same data type as X if they differ. This will increase memory used for the method.