Lars#
- class cuml.experimental.linear_model.Lars(*, fit_intercept=True, verbose=False, output_type=None, copy_X=True, fit_path=True, n_nonzero_coefs=500, eps=None, precompute='auto')#
Least Angle Regression
Least Angle Regression (LAR or LARS) is a model selection algorithm. It builds up the model using the following algorithm:
We start with all the coefficients equal to zero.
At each step we select the predictor that has the largest absolute correlation with the residual.
We take the largest step possible in the direction which is equiangular with all the predictors selected so far. The largest step is determined such that using this step a new predictor will have as much correlation with the residual as any of the currently active predictors.
Stop if
max_iterreached or all the predictors are used, or if the correlation between any unused predictor and the residual is lower than a tolerance.
The solver is based on [1]. The equations referred in the comments correspond to the equations in the paper.
Note
This algorithm assumes that the offset is removed from
Xandy, and each feature is normalized:\[sum_i y_i = 0, sum_i x_{i,j} = 0,sum_i x_{i,j}^2=1 for j=0..n_{col}-1\]- Parameters:
- fit_interceptboolean (default = True)
If True, Lars tries to correct for the global mean of y. If False, the model expects that you have centered the data.
- copy_Xboolean (default = True)
The solver permutes the columns of X. Set
copy_Xto True to prevent changing the input data.- fit_pathboolean (default = True)
Whether to return all the coefficients along the regularization path in the
coef_path_attribute.- precomputebool, ‘auto’, or array-like with shape = (n_features, n_features). (default = ‘auto’)
Whether to precompute the Gram matrix. The user can provide the Gram matrix as an argument.
- n_nonzero_coefsint (default 500)
The maximum number of coefficients to fit. This gives an upper limit of how many features we select for prediction. This number is also an upper limit of the number of iterations.
- verboseint or boolean, default=False
Sets logging level. It must be one of
cuml.common.logger.level_*. See Verbosity Levels for more info.- output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None
Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (
cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.
- Attributes:
- alphas_array of floats or doubles, shape = [n_alphas + 1]
The maximum correlation at each step.
- active_array of ints shape = [n_alphas]
The indices of the active variables at the end of the path.
- beta_array of floats or doubles [n_asphas]
The active regression coefficients (same as
coef_but zeros omitted).- coef_path_array of floats or doubles, shape = [n_alphas, n_alphas + 1]
The coefficients along the regularization path. Stored only if
fit_pathis True. Note that we only store coefficients for indices in the active set (i.e.coef_path_[:,-1] == coef_[active_])- coef_array, shape (n_features)
The estimated coefficients for the regression model.
- intercept_scalar, float or double
The independent term. If
fit_intercept_is False, will be 0.- n_iter_int
The number of iterations taken by the solver.
Methods
fit(self, X, y[, convert_dtype])Fit the model with X and y.
predict(self, X[, convert_dtype])Predicts
yvalues forX.Notes
For additional information, see scikitlearn’s OLS documentation.
References
- fit(self, X, y, convert_dtype=True) 'Lars'[source]#
Fit the model with X and y.
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- yarray-like (device or host) shape = (n_samples, 1)
Dense matrix of any dtype. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the train method will, when necessary, convert y to be the same data type as X if they differ. This will increase memory used for the method.
- predict(self, X, convert_dtype=True) CumlArray[source]#
Predicts
yvalues forX.- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense matrix (floats or doubles) of shape (n_samples, n_features). Acceptable formats: cuDF DataFrame, NumPy ndarray, Numba device ndarray, cuda array interface compliant array like CuPy
- convert_dtypebool, optional (default = True)
When set to True, the predict method will, when necessary, convert the input to the data type which was used to train the model. This will increase memory used for the method.
- Returns:
- y: cuDF DataFrame
Dense vector (floats or doubles) of shape (n_samples, 1)