LogisticRegression#

class cuml.linear_model.LogisticRegression(*, penalty='l2', tol=0.0001, C=1.0, fit_intercept=True, class_weight=None, max_iter=1000, linesearch_max_iter=50, l1_ratio=None, solver='qn', lbfgs_memory=5, penalty_normalized=True, verbose=False, output_type=None)[source]#

Logistic Regression classifier.

LogisticRegression is a linear model that is used to model probability of occurrence of certain events, for example probability of success or fail of an event.

Parameters:
penalty{‘l1’, ‘l2’, ‘elasticnet’, None}, default=’l2’

Specifies the penalty term to use. 'l1' and 'l2' will use an L1 or L2 penalty, respectively. 'elasticnet' will use both an L1 and L2 penalty. None will not use a penalty.

tolfloat, default=1e-4

Tolerance for stopping criteria. The exact stopping conditions depend on the chosen solver. Check the solver’s documentation for more details:

  • Quasi-Newton (L-BFGS/OWL-QN)

Cfloat, default=1.0

Inverse of regularization strength; must be a positive float.

fit_interceptbool, default=True

Specifies if a constant (a.k.a bias or intercept) should be added to the decision function. Note that, just like in Scikit-learn, the bias will not be regularized.

class_weightdict or ‘balanced’, default=None

By default all classes have a weight one. However, a dictionary can be provided with weights associated with classes in the form {class_label: weight}. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)). Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

max_iterint, default=1000

Maximum number of iterations taken for the solvers to converge.

linesearch_max_iterint, default=50

Max number of linesearch iterations per outer iteration used in the lbfgs and owl QN solvers.

l1_ratiofloat or None, default=None

The Elastic-Net mixing parameter, with 0 <= l1_ratio <= 1

solver{‘qn’}, default=’qn’

Algorithm to use in the optimization problem. Currently only qn is supported, which automatically selects either L-BFGS or OWL-QN depending on the conditions of the l1 regularization described above.

lbfgs_memory: int, default = 5

Rank of the lbfgs inverse-Hessian approximation. Method will use O(lbfgs_memory * n_features) memory.

penalty_normalizedbool, default=True

By default the penalty term is divided by the sample size. Set to False to skip this behavior.

verboseint or boolean, default=False

Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None

Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.

Attributes:
coef_: array, shape=(n_classes, n_features) or (n_classes, n_features)

The estimated coefficients for the logistic regression model.

intercept_: array, shape=(1,) or (n_classes,)

The independent term. If fit_intercept is False, will be 0.

n_iter_: array, shape (1,)

The number of iterations taken for the solvers to converge.

classes_np.ndarray, shape=(n_classes,)

Array of the class labels.

Methods

fit(X, y[, sample_weight, convert_dtype])

Fit the model with X and y.

predict(X, *[, convert_dtype])

Predicts the y for X.

predict_log_proba(X, *[, convert_dtype])

Predicts the log class probabilities for each class in X

predict_proba(X, *[, convert_dtype])

Predicts the class probabilities for each class in X

Notes

cuML’s LogisticRegression uses a different solver that the equivalent Scikit-learn, except when there is no penalty and solver=lbfgs is used in Scikit-learn. This can cause (smaller) differences in the coefficients and predictions of the model, similar to using different solvers in Scikit-learn.

For additional information, see Scikit-learn’s LogisticRegression.

Examples

>>> import cuml
>>> import cupy as cp
>>> X = cp.array([[1, 1], [1, 2], [2, 2], [2, 3]])
>>> y = cp.array([0, 0, 1, 1])
>>> model = cuml.LogisticRegression().fit(X, y)
>>> model.predict(X)
array([0, 0, 1, 1])
fit(X, y, sample_weight=None, *, convert_dtype=True) LogisticRegression[source]#

Fit the model with X and y.

Parameters:
Xarray-like (device or host) shape = (n_samples, n_features)

Dense or sparse matrix containing floats or doubles. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

yarray-like (device or host) shape = (n_samples, 1)

Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

sample_weightarray-like (device or host) shape = (n_samples,), default=None

The weights for each observation in X. If None, all observations are assigned equal weight. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

convert_dtypebool, optional (default = True)

When set to True, the train method will, when necessary, convert y to be the same data type as X if they differ. This will increase memory used for the method.

predict(X, *, convert_dtype=True)[source]#

Predicts the y for X.

Parameters:
Xarray-like (device or host) shape = (n_samples, n_features)

Dense or sparse matrix containing floats or doubles. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

convert_dtypebool, optional (default = True)

When set to True, the predict method will, when necessary, convert the input to the data type which was used to train the model. This will increase memory used for the method.

Returns:
predscuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, 1)

Predicted values

For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.

predict_log_proba(X, *, convert_dtype=True) CumlArray[source]#

Predicts the log class probabilities for each class in X

Parameters:
Xarray-like (device or host) shape = (n_samples, n_features)

Dense or sparse matrix containing floats or doubles. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

convert_dtypebool, optional (default = True)

When set to True, the predict_log_proba method will, when necessary, convert the input to the data type which was used to train the model. This will increase memory used for the method.

Returns:
probscuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, n_classes)

Log probabilities per class for each sample.

For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.

predict_proba(X, *, convert_dtype=True) CumlArray[source]#

Predicts the class probabilities for each class in X

Parameters:
Xarray-like (device or host) shape = (n_samples, n_features)

Dense or sparse matrix containing floats or doubles. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

convert_dtypebool, optional (default = True)

When set to True, the predict_proba method will, when necessary, convert the input to the data type which was used to train the model. This will increase memory used for the method.

Returns:
probscuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, n_classes)

Probabilities per class for each sample.

For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.