QN#
- class cuml.solvers.QN(*, loss='sigmoid', fit_intercept=True, l1_strength=0.0, l2_strength=0.0, max_iter=1000, tol=0.0001, delta=None, linesearch_max_iter=50, lbfgs_memory=5, verbose=False, output_type=None, warm_start=False, penalty_normalized=True)#
Quasi-Newton methods are used to either find zeroes or local maxima and minima of functions, and used by this class to optimize a cost function.
Two algorithms are implemented underneath cuML’s QN class, and which one is executed depends on the following rule:
Orthant-Wise Limited Memory Quasi-Newton (OWL-QN) if there is l1 regularization
Limited Memory BFGS (L-BFGS) otherwise.
- Parameters:
- loss: ‘sigmoid’, ‘softmax’, ‘l1’, ‘l2’, ‘svc_l1’, ‘svc_l2’, ‘svr_l1’, ‘svr_l2’ (default = ‘sigmoid’).
‘sigmoid’ loss used for single class logistic regression; ‘softmax’ loss used for multiclass logistic regression; ‘l1’/’l2’ loss used for regression.
- fit_intercept: boolean (default = True)
If True, the model tries to correct for the global mean of y. If False, the model expects that you have centered the data.
- l1_strength: float (default = 0.0)
l1 regularization strength (if non-zero, will run OWL-QN, else L-BFGS). Use
penalty_normalizedto control whether the solver divides this by the sample size.- l2_strength: float (default = 0.0)
l2 regularization strength. Use
penalty_normalizedto control whether the solver divides this by the sample size.- max_iter: int (default = 1000)
Maximum number of iterations taken for the solvers to converge.
- tol: float (default = 1e-4)
The training process will stop if
norm(current_loss_grad) <= tol * max(current_loss, tol).This differs slightly from the
gtol-controlled stopping condition in scipy.optimize.minimize(method=’L-BFGS-B’):norm(current_loss_projected_grad) <= gtol.Note, sklearn.LogisticRegression() uses the sum of softmax/logistic loss over the input data, whereas cuML uses the average. As a result, Scikit-learn’s loss is usually
sample_sizetimes larger than cuML’s. To account for the differences you may divide thetolby the sample size; this would ensure that the cuML solver does not stop earlier than the Scikit-learn solver.- delta: Optional[float] (default = None)
The training process will stop if
abs(current_loss - previous_loss) <= delta * max(current_loss, tol).When
None, it’s set totol * 0.01; when0, the check is disabled. Given the current stepk, parameterprevious_losshere is the loss at the stepk - p, wherepis a small positive integer set internally.Note, this parameter corresponds to
ftolin scipy.optimize.minimize(method=’L-BFGS-B’), which is set by default to a minuscule2.2e-9and is not exposed in sklearn.LogisticRegression(). This condition is meant to protect the solver against doing vanishingly small linesearch steps or zigzagging. You may choose to setdelta = 0to make sure the cuML solver does not stop earlier than the Scikit-learn solver.- linesearch_max_iter: int (default = 50)
Max number of linesearch iterations per outer iteration of the algorithm.
- lbfgs_memory: int (default = 5)
Rank of the lbfgs inverse-Hessian approximation. Method will use O(lbfgs_memory * D) memory.
- verboseint or boolean, default=False
Sets logging level. It must be one of
cuml.common.logger.level_*. See Verbosity Levels for more info.- output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None
Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (
cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.- warm_startbool, default=False
When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
- penalty_normalizedbool, default=True
When set to True, l1 and l2 parameters are divided by the sample size. This flag can be used to achieve a behavior compatible with other implementations, such as sklearn’s.
- Attributes:
- coef_array, shape (n_classes, n_features)
The estimated coefficients for the linear regression model.
- intercept_array (n_classes, 1)
The independent term. If
fit_interceptis False, will be 0.
Methods
fit(self, X, y[, sample_weight, convert_dtype])Fit the model with X and y.
predict(self, X, *[, convert_dtype])Predicts the y for X.
score(self, X, y)Notes
This class contains implementations of two popular Quasi-Newton methods:
Limited-memory Broyden Fletcher Goldfarb Shanno (L-BFGS) [Nocedal, Wright - Numerical Optimization (1999)]
Orthant-wise limited-memory quasi-newton (OWL-QN) [Andrew, Gao - ICML 2007]
Examples
>>> import cupy as cp >>> from cuml.solvers import QN >>> X = cp.array([[1, 1], [1, 2], [2, 2], [2, 3]]) >>> y = cp.array([0, 0, 1, 1]) >>> solver = QN(loss="sigmoid").fit(X, y) >>> solver.predict(X) array([0, 0, 1, 1], dtype=int32)
- fit(self, X, y, sample_weight=None, convert_dtype=True) 'QN'[source]#
Fit the model with X and y.
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense or sparse matrix containing floats or doubles. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- yarray-like (device or host) shape = (n_samples, 1)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- sample_weightarray-like (device or host) shape = (n_samples,), default=None
The weights for each observation in X. If None, all observations are assigned equal weight. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the train method will, when necessary, convert y to be the same data type as X if they differ. This will increase memory used for the method.
- predict(self, X, *, convert_dtype=True) CumlArray[source]#
Predicts the y for X.
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense or sparse matrix containing floats or doubles. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the predict method will, when necessary, convert the input to the data type which was used to train the model. This will increase memory used for the method.