CD#
- class cuml.solvers.CD(*, loss='squared_loss', alpha=0.0001, l1_ratio=0.15, fit_intercept=True, max_iter=1000, tol=0.001, shuffle=True, output_type=None, verbose=False)#
Coordinate Descent (CD) is a very common optimization algorithm that minimizes along coordinate directions to find the minimum of a function.
cuML’s CD algorithm accepts a numpy matrix or a cuDF DataFrame as the input dataset.algorithm The CD algorithm currently works with linear regression and ridge, lasso, and elastic-net penalties.
- Parameters:
- loss‘squared_loss’
Only ‘squared_loss’ is supported right now. ‘squared_loss’ uses linear regression in its predict step.
- alpha: float (default = 0.0001)
The constant value which decides the degree of regularization. ‘alpha = 0’ is equivalent to an ordinary least square, solved by the LinearRegression object.
- l1_ratio: float (default = 0.15)
The ElasticNet mixing parameter, with 0 <= l1_ratio <= 1. For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.
- fit_interceptboolean (default = True)
If True, the model tries to correct for the global mean of y. If False, the model expects that you have centered the data.
- max_iterint (default = 1000)
The number of times the model should iterate through the entire dataset during training
- tolfloat (default = 1e-3)
The tolerance for the optimization: if the updates are smaller than tol, solver stops.
- shuffleboolean (default = True)
If set to ‘True’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘True’) often leads to significantly faster convergence especially when tol is higher than 1e-4.
- verboseint or boolean, default=False
Sets logging level. It must be one of
cuml.common.logger.level_*. See Verbosity Levels for more info.- output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None
Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (
cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.
- Attributes:
- coef_
Methods
fit(self, X, y[, convert_dtype, sample_weight])Fit the model with X and y.
predict(self, X[, convert_dtype])Predicts the y for X.
Examples
>>> import cupy as cp >>> import cudf >>> from cuml.solvers import CD
>>> cd = CD(alpha=0.0)
>>> X = cudf.DataFrame() >>> X['col1'] = cp.array([1,1,2,2], dtype=cp.float32) >>> X['col2'] = cp.array([1,2,2,3], dtype=cp.float32)
>>> y = cudf.Series(cp.array([6.0, 8.0, 9.0, 11.0], dtype=cp.float32))
>>> cd.fit(X,y) CD() >>> print(cd.coef_) 0 1.001... 1 1.998... dtype: float32 >>> print(cd.intercept_) 3.00... >>> X_new = cudf.DataFrame() >>> X_new['col1'] = cp.array([3,2], dtype=cp.float32) >>> X_new['col2'] = cp.array([5,5], dtype=cp.float32)
>>> preds = cd.predict(X_new) >>> print(preds) 0 15.997... 1 14.995... dtype: float32
- fit(self, X, y, convert_dtype=True, sample_weight=None) 'CD'[source]#
Fit the model with X and y.
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- yarray-like (device or host) shape = (n_samples, 1)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the train method will, when necessary, convert y to be the same data type as X if they differ. This will increase memory used for the method.
- sample_weightarray-like (device or host) shape = (n_samples,), default=None
The weights for each observation in X. If None, all observations are assigned equal weight. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- predict(self, X, convert_dtype=True) CumlArray[source]#
Predicts the y for X.
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the predict method will, when necessary, convert the input to the data type which was used to train the model. This will increase memory used for the method.
- Returns:
- predscuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, 1)
Predicted values
For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.