QN#

class cuml.solvers.QN(*, loss='sigmoid', fit_intercept=True, l1_strength=0.0, l2_strength=0.0, max_iter=1000, tol=0.0001, delta=None, linesearch_max_iter=50, lbfgs_memory=5, verbose=False, output_type=None, warm_start=False, penalty_normalized=True)#

Quasi-Newton methods are used to either find zeroes or local maxima and minima of functions, and used by this class to optimize a cost function.

Two algorithms are implemented underneath cuML’s QN class, and which one is executed depends on the following rule:

  • Orthant-Wise Limited Memory Quasi-Newton (OWL-QN) if there is l1 regularization

  • Limited Memory BFGS (L-BFGS) otherwise.

Parameters:
loss: ‘sigmoid’, ‘softmax’, ‘l1’, ‘l2’, ‘svc_l1’, ‘svc_l2’, ‘svr_l1’, ‘svr_l2’ (default = ‘sigmoid’).

‘sigmoid’ loss used for single class logistic regression; ‘softmax’ loss used for multiclass logistic regression; ‘l1’/’l2’ loss used for regression.

fit_intercept: boolean (default = True)

If True, the model tries to correct for the global mean of y. If False, the model expects that you have centered the data.

l1_strength: float (default = 0.0)

l1 regularization strength (if non-zero, will run OWL-QN, else L-BFGS). Use penalty_normalized to control whether the solver divides this by the sample size.

l2_strength: float (default = 0.0)

l2 regularization strength. Use penalty_normalized to control whether the solver divides this by the sample size.

max_iter: int (default = 1000)

Maximum number of iterations taken for the solvers to converge.

tol: float (default = 1e-4)

The training process will stop if

norm(current_loss_grad) <= tol * max(current_loss, tol).

This differs slightly from the gtol-controlled stopping condition in scipy.optimize.minimize(method=’L-BFGS-B’):

norm(current_loss_projected_grad) <= gtol.

Note, sklearn.LogisticRegression() uses the sum of softmax/logistic loss over the input data, whereas cuML uses the average. As a result, Scikit-learn’s loss is usually sample_size times larger than cuML’s. To account for the differences you may divide the tol by the sample size; this would ensure that the cuML solver does not stop earlier than the Scikit-learn solver.

delta: Optional[float] (default = None)

The training process will stop if

abs(current_loss - previous_loss) <= delta * max(current_loss, tol).

When None, it’s set to tol * 0.01; when 0, the check is disabled. Given the current step k, parameter previous_loss here is the loss at the step k - p, where p is a small positive integer set internally.

Note, this parameter corresponds to ftol in scipy.optimize.minimize(method=’L-BFGS-B’), which is set by default to a minuscule 2.2e-9 and is not exposed in sklearn.LogisticRegression(). This condition is meant to protect the solver against doing vanishingly small linesearch steps or zigzagging. You may choose to set delta = 0 to make sure the cuML solver does not stop earlier than the Scikit-learn solver.

linesearch_max_iter: int (default = 50)

Max number of linesearch iterations per outer iteration of the algorithm.

lbfgs_memory: int (default = 5)

Rank of the lbfgs inverse-Hessian approximation. Method will use O(lbfgs_memory * D) memory.

verboseint or boolean, default=False

Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None

Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.

penalty_normalizedbool, default=True

When set to True, l1 and l2 parameters are divided by the sample size. This flag can be used to achieve a behavior compatible with other implementations, such as sklearn’s.

Attributes:
coef_array, shape (n_classes, n_features)

The estimated coefficients for the linear regression model.

intercept_array (n_classes, 1)

The independent term. If fit_intercept is False, will be 0.

Methods

fit(self, X, y[, sample_weight, convert_dtype])

Fit the model with X and y.

predict(self, X, *[, convert_dtype])

Predicts the y for X.

score(self, X, y)

Notes

This class contains implementations of two popular Quasi-Newton methods:

Examples

>>> import cupy as cp
>>> from cuml.solvers import QN
>>> X = cp.array([[1, 1], [1, 2], [2, 2], [2, 3]])
>>> y = cp.array([0, 0, 1, 1])
>>> solver = QN(loss="sigmoid").fit(X, y)
>>> solver.predict(X)
array([0, 0, 1, 1], dtype=int32)
fit(self, X, y, sample_weight=None, convert_dtype=True) 'QN'[source]#

Fit the model with X and y.

Parameters:
Xarray-like (device or host) shape = (n_samples, n_features)

Dense or sparse matrix containing floats or doubles. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

yarray-like (device or host) shape = (n_samples, 1)

Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

sample_weightarray-like (device or host) shape = (n_samples,), default=None

The weights for each observation in X. If None, all observations are assigned equal weight. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

convert_dtypebool, optional (default = True)

When set to True, the train method will, when necessary, convert y to be the same data type as X if they differ. This will increase memory used for the method.

predict(self, X, *, convert_dtype=True) CumlArray[source]#

Predicts the y for X.

Parameters:
Xarray-like (device or host) shape = (n_samples, n_features)

Dense or sparse matrix containing floats or doubles. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

convert_dtypebool, optional (default = True)

When set to True, the predict method will, when necessary, convert the input to the data type which was used to train the model. This will increase memory used for the method.

score(self, X, y)[source]#