LogisticRegression#
- class cuml.linear_model.LogisticRegression(*, penalty='l2', tol=0.0001, C=1.0, fit_intercept=True, class_weight=None, max_iter=1000, linesearch_max_iter=50, l1_ratio=None, solver='qn', lbfgs_memory=5, penalty_normalized=True, verbose=False, output_type=None)[source]#
Logistic Regression classifier.
LogisticRegression is a linear model that is used to model probability of occurrence of certain events, for example probability of success or fail of an event.
- Parameters:
- penalty{‘l1’, ‘l2’, ‘elasticnet’, None}, default=’l2’
Specifies the penalty term to use.
'l1'and'l2'will use an L1 or L2 penalty, respectively.'elasticnet'will use both an L1 and L2 penalty.Nonewill not use a penalty.- tolfloat, default=1e-4
Tolerance for stopping criteria. The exact stopping conditions depend on the chosen solver. Check the solver’s documentation for more details:
Quasi-Newton (L-BFGS/OWL-QN)
- Cfloat, default=1.0
Inverse of regularization strength; must be a positive float.
- fit_interceptbool, default=True
Specifies if a constant (a.k.a bias or intercept) should be added to the decision function. Note that, just like in Scikit-learn, the bias will not be regularized.
- class_weightdict or ‘balanced’, default=None
By default all classes have a weight one. However, a dictionary can be provided with weights associated with classes in the form
{class_label: weight}. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data asn_samples / (n_classes * np.bincount(y)). Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.- max_iterint, default=1000
Maximum number of iterations taken for the solvers to converge.
- linesearch_max_iterint, default=50
Max number of linesearch iterations per outer iteration used in the lbfgs and owl QN solvers.
- l1_ratiofloat or None, default=None
The Elastic-Net mixing parameter, with
0 <= l1_ratio <= 1- solver{‘qn’}, default=’qn’
Algorithm to use in the optimization problem. Currently only
qnis supported, which automatically selects either L-BFGS or OWL-QN depending on the conditions of the l1 regularization described above.- lbfgs_memory: int, default = 5
Rank of the lbfgs inverse-Hessian approximation. Method will use O(lbfgs_memory * n_features) memory.
- penalty_normalizedbool, default=True
By default the penalty term is divided by the sample size. Set to False to skip this behavior.
- verboseint or boolean, default=False
Sets logging level. It must be one of
cuml.common.logger.level_*. See Verbosity Levels for more info.- output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None
Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (
cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.
- Attributes:
- coef_: array, shape=(n_classes, n_features) or (n_classes, n_features)
The estimated coefficients for the logistic regression model.
- intercept_: array, shape=(1,) or (n_classes,)
The independent term. If
fit_interceptis False, will be 0.- n_iter_: array, shape (1,)
The number of iterations taken for the solvers to converge.
- classes_np.ndarray, shape=(n_classes,)
Array of the class labels.
Methods
fit(X, y[, sample_weight, convert_dtype])Fit the model with X and y.
predict(X, *[, convert_dtype])Predicts the y for X.
predict_log_proba(X, *[, convert_dtype])Predicts the log class probabilities for each class in X
predict_proba(X, *[, convert_dtype])Predicts the class probabilities for each class in X
Notes
cuML’s LogisticRegression uses a different solver that the equivalent Scikit-learn, except when there is no penalty and
solver=lbfgsis used in Scikit-learn. This can cause (smaller) differences in the coefficients and predictions of the model, similar to using different solvers in Scikit-learn.For additional information, see Scikit-learn’s LogisticRegression.
Examples
>>> import cuml >>> import cupy as cp >>> X = cp.array([[1, 1], [1, 2], [2, 2], [2, 3]]) >>> y = cp.array([0, 0, 1, 1]) >>> model = cuml.LogisticRegression().fit(X, y) >>> model.predict(X) array([0, 0, 1, 1])
- fit(X, y, sample_weight=None, *, convert_dtype=True) LogisticRegression[source]#
Fit the model with X and y.
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense or sparse matrix containing floats or doubles. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- yarray-like (device or host) shape = (n_samples, 1)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- sample_weightarray-like (device or host) shape = (n_samples,), default=None
The weights for each observation in X. If None, all observations are assigned equal weight. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the train method will, when necessary, convert y to be the same data type as X if they differ. This will increase memory used for the method.
- predict(X, *, convert_dtype=True)[source]#
Predicts the y for X.
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense or sparse matrix containing floats or doubles. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the predict method will, when necessary, convert the input to the data type which was used to train the model. This will increase memory used for the method.
- Returns:
- predscuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, 1)
Predicted values
For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.
- predict_log_proba(X, *, convert_dtype=True) CumlArray[source]#
Predicts the log class probabilities for each class in X
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense or sparse matrix containing floats or doubles. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the predict_log_proba method will, when necessary, convert the input to the data type which was used to train the model. This will increase memory used for the method.
- Returns:
- probscuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, n_classes)
Log probabilities per class for each sample.
For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.
- predict_proba(X, *, convert_dtype=True) CumlArray[source]#
Predicts the class probabilities for each class in X
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense or sparse matrix containing floats or doubles. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the predict_proba method will, when necessary, convert the input to the data type which was used to train the model. This will increase memory used for the method.
- Returns:
- probscuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, n_classes)
Probabilities per class for each sample.
For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.