LinearSVC#
- class cuml.svm.LinearSVC(*, penalty='l2', loss='squared_hinge', C=1.0, fit_intercept=True, penalized_intercept=False, class_weight=None, probability=False, tol=0.0001, max_iter=1000, linesearch_max_iter=100, lbfgs_memory=5, n_streams=1, multi_class='ovr', verbose=False, output_type=None)[source]#
Linear Support Vector Classification.
Similar to SVC with parameter kernel=’linear’, but implemented using a linear solver. This enables flexibility in penalties and loss functions, and can scale better for larger problems.
- Parameters:
- penalty{‘l1’, ‘l2’}, default = ‘l2’
The norm used in the penalization.
- loss{‘hinge’, ‘squared_hinge’}, default=’squared_hinge’
The loss function.
- Cfloat, default=1.0
Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive.
- fit_interceptbool, default=True
Whether to fit the bias term. Set to False if you expect that the data is already centered.
- penalized_interceptbool, default=False
When true, the bias term is treated the same way as other features; i.e. it’s penalized by the regularization term of the target function. Enabling this feature forces an extra copying the input data X.
- class_weightdict or string, default=None
Weights to modify the parameter C for class i to
class_weight[i]*C. The string ‘balanced’ is also accepted, in which caseclass_weight[i] = n_samples / (n_classes * n_samples_of_class[i])- probability: bool, default=False
Set to True to enable probability estimate methods (
predict_proba,predict_log_proba).- tolfloat, default=1e-4
Tolerance for the stopping criterion.
- max_iterint, default=1000
Maximum number of iterations for the underlying solver.
- linesearch_max_iterint, default=100
Maximum number of linesearch (inner loop) iterations for the underlying (QN) solver.
- lbfgs_memoryint, default=5
Number of vectors approximating the hessian for the underlying QN solver (l-bfgs).
- n_streamsint (default = 1)
Number of parallel streams used for fitting.
- multi_class{‘ovr’}, default=’ovr’
Multiclass classification strategy. Currently only ‘ovr’ is supported.
- verboseint or boolean, default=False
Sets logging level. It must be one of
cuml.common.logger.level_*. See Verbosity Levels for more info.- output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None
Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (
cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.
- Attributes:
- coef_array, shape (1, n_features) if n_classes == 2 else (n_classes, n_features)
Weights assigned to the features (coefficients in the primal problem).
- intercept_array or float, shape (1,) if n_classes == 2 else (n_classes,)
The constant factor in the decision function. If
fit_intercept=Falsethis is instead a float with value 0.0.- classes_np.ndarray, shape=(n_classes,)
A sorted array of the class labels.
- n_iter_int
The maximum number of iterations run across all classes during the fit.
- prob_scale_array or None, shape (
n_classes_, 2) The probability calibration constants if
probability=True, otherwiseNone.
Methods
fit(X, y[, sample_weight, convert_dtype])Fit the model according to the given training data.
predict(X, *[, convert_dtype])Predict class labels for samples in X.
predict_log_proba(X, *[, convert_dtype])Compute log probabilities of possible outcomes for samples in X.
predict_proba(X, *[, convert_dtype])Compute probabilities of possible outcomes for samples in X.
Notes
The model uses the quasi-newton (QN) solver to find the solution in the primal space. Thus, in contrast to generic
SVCmodel, it does not compute the support coefficients/vectors.Check the solver’s documentation for more details
Quasi-Newton (L-BFGS/OWL-QN).For additional docs, see scikitlearn’s LinearSVC.
Examples
>>> import cupy as cp >>> from cuml.svm import LinearSVC >>> X = cp.array([[1,1], [2,1], [1,2], [2,2], [1,3], [2,3]], ... dtype=cp.float32); >>> y = cp.array([0, 0, 1, 0, 1, 1], dtype=cp.float32) >>> clf = LinearSVC(penalty='l1', C=1).fit(X, y) >>> print("Predicted labels:", clf.predict(X)) Predicted labels: [0 0 1 0 1 1]
- fit(X, y, sample_weight=None, *, convert_dtype=True) LinearSVC[source]#
Fit the model according to the given training data.
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- yarray-like (device or host) shape = (n_samples, 1)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- sample_weightarray-like (device or host) shape = (n_samples,), default=None
The weights for each observation in X. If None, all observations are assigned equal weight. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the train method will, when necessary, convert y to be the same data type as X if they differ. This will increase memory used for the method.
- predict(X, *, convert_dtype=True)[source]#
Predict class labels for samples in X.
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the predict method will, when necessary, convert the input to the data type which was used to train the model. This will increase memory used for the method.
- Returns:
- y_predcuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples,)
Predicted class labels.
For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.
- predict_log_proba(X, *, convert_dtype=True) CumlArray[source]#
Compute log probabilities of possible outcomes for samples in X.
The model must have been fit with
probability=Truefor this method to be available.- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the predict_log_proba method will, when necessary, convert the input to the data type which was used to train the model. This will increase memory used for the method.
- Returns:
- probscuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, n_classes)
Log probabilities per class for each sample.
For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.
- predict_proba(X, *, convert_dtype=True) CumlArray[source]#
Compute probabilities of possible outcomes for samples in X.
The model must have been fit with
probability=Truefor this method to be available.- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the predict_proba method will, when necessary, convert the input to the data type which was used to train the model. This will increase memory used for the method.
- Returns:
- probscuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, n_classes)
Probabilities per class for each sample.
For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.