SVC#

class cuml.svm.SVC(C-Support Vector Classification)[source]#

Construct an SVC classifier for training and predictions.

Parameters:

Cfloat (default = 1.0)

Penalty parameter C

kernelstring (default=’rbf’)

Specifies the kernel function. Possible options: ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’. When using ‘precomputed’, X is expected to be a precomputed kernel matrix of shape (n_samples, n_samples) at fit time, and (n_samples_test, n_samples_train) at predict time. A valid kernel matrix should be symmetric and positive semi-definite; cuML does not validate these properties.

degreeint (default=3)

Degree of polynomial kernel function.

gammafloat or string (default = ‘scale’)

Coefficient for rbf, poly, and sigmoid kernels. You can specify the numeric value, or use one of the following options:

‘auto’: gamma will be set to 1 / n_features
‘scale’: gamma will be se to 1 / (n_features * X.var())

coef0float (default = 0.0)

Independent term in kernel function, only significant for poly and sigmoid

tolfloat (default = 1e-3)

Tolerance for stopping criterion.

cache_sizefloat (default = 1024.0)

Size of the kernel cache during training in MiB. Increase it to improve the training time, at the cost of higher memory footprint. After training the kernel cache is deallocated. During prediction, we also need a temporary space to store kernel matrix elements (this can be significant if n_support is large). The cache_size variable sets an upper limit to the prediction buffer as well.

class_weightdict or string (default=None)

Weights to modify the parameter C for class i to class_weight[i]*C. The string ‘balanced’ is also accepted, in which case class_weight[i] = n_samples / (n_classes * n_samples_of_class[i])

max_iterint (default = -1)

Limit the number of total iterations in the solver. Default of -1 for “no limit”.

decision_function_shapestr (‘ovo’ or ‘ovr’, default ‘ovo’)

Multiclass classification strategy. 'ovo' uses OneVsOneClassifier while 'ovr' selects OneVsRestClassifier

nochange_stepsint (default = 1000)

We monitor how much our stopping criteria changes during outer iterations. If it does not change (changes less then 1e-3*tol) for nochange_steps consecutive steps, then we stop training.

output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None

Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.

probabilitybool (default = False)

Set to True to enable probability estimates (predict_proba/predict_log_proba). Note that probability=True requires your training data have at least 5 samples per class.

random_state: int (default = None)

Seed for random number generator (used only when probability=True).

verboseint or boolean, default=False

Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

Attributes:

n_support_int: The total number of support vectors. Note: this will change in the future to represent number support vectors for each class (like in Sklearn, see rapidsai/cuml#956 )
support_int, shape = (n_support): Device array of support vector indices
support_vectors_float, shape (n_support, n_cols): Device array of support vectors. For kernel=’precomputed’, this attribute is empty (shape (0, 0)) since the original feature vectors are not available.
dual_coef_float, shape = (1, n_support): Device array of coefficients for support vectors
intercept_float: The constant in the decision function
fit_status_int: 0 if SVM is correctly fitted
n_iter_array: Number of outer iterations run by the solver for each model fit.
coef_float, shape (1, n_cols): SVMBase.coef_(self)
classes_np.ndarray, shape=(n_classes,): A sorted array of the class labels.
class_weight_np.ndarray of shape (n_classes,): Class weight multipliers, computed based on the class_weight parameter.
n_classes_int: Number of classes

Methods

`decision_function`(X, *[, convert_dtype])	Calculates the decision function values for X.
`fit`(X, y[, sample_weight, convert_dtype])	Fit the model with X and y.
`predict`(X, *[, convert_dtype])	Predicts the class labels for X.
`predict_log_proba`(X)	Predicts the log probabilities for X (returns log(predict_proba(x)).
`predict_proba`(X, *[, log])	Predicts the class probabilities for X.

Notes

The solver uses the SMO method to fit the classifier. We use the Optimized Hierarchical Decomposition [1] variant of the SMO algorithm, similar to [2].

For additional docs, see scikitlearn’s SVC.

References

[1]

J. Vanek et al. A GPU-Architecture Optimized Hierarchical Decomposition Algorithm for Support VectorMachine Training, IEEE Transactions on Parallel and Distributed Systems, vol 28, no 12, 3330, (2017)

[2]

Z. Wen et al. ThunderSVM: A Fast SVM Library on GPUs and CPUs, Journal of Machine Learning Research, 19, 1-5 (2018)

Examples

>>> import cupy as cp
>>> from cuml.svm import SVC
>>> X = cp.array([[1,1], [2,1], [1,2], [2,2], [1,3], [2,3]],
...              dtype=cp.float32);
>>> y = cp.array([-1, -1, 1, -1, 1, 1], dtype=cp.float32)
>>> clf = SVC(kernel='poly', degree=2, gamma='auto', C=1)
>>> clf.fit(X, y)
SVC()
>>> print("Predicted labels:", clf.predict(X))
Predicted labels: [-1. -1.  1. -1.  1.  1.]

decision_function(X, *, convert_dtype=True) → CumlArray[source]#

Calculates the decision function values for X.

For precomputed kernels, X should be a kernel matrix of shape (n_samples_test, n_samples_train) where n_samples_train is the number of samples used during fit.

Parameters:

Xarray-like (device or host) shape = (n_samples, n_features): Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
convert_dtypebool, optional (default = True): When set to True, the decision_function method will, when necessary, convert the input to the data type which was used to train the model. This will increase memory used for the method.

Returns:

resultscuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, 1)

Decision function values

For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.

fit(X, y, sample_weight=None, *, convert_dtype=True) → SVC[source]#

Fit the model with X and y.

Parameters:

Xarray-like (device or host) shape = (n_samples, n_features): Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
yarray-like (device or host) shape = (n_samples, 1): Dense matrix of any dtype. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
sample_weightarray-like (device or host) shape = (n_samples,), default=None: The weights for each observation in X. If None, all observations are assigned equal weight. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
convert_dtypebool, optional (default = True): When set to True, the train method will, when necessary, convert y to be the same data type as X if they differ. This will increase memory used for the method.

property intercept_#: Python descriptor object to control getting/setting CumlArray attributes on Base objects. See the Estimator Guide for an in depth guide.

predict(X, *, convert_dtype=True)[source]#

Predicts the class labels for X. The returned y values are the class labels associated to sign(decision_function(X)).

Parameters:

Xarray-like (device or host) shape = (n_samples, n_features): Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
convert_dtypebool, optional (default = True): When set to True, the predict method will, when necessary, convert the input to the data type which was used to train the model. This will increase memory used for the method.

Returns:

predscuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, 1)

Predicted values

For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.

predict_log_proba(X) → CumlArray[source]#

Predicts the log probabilities for X (returns log(predict_proba(x)).

The model has to be trained with probability=True to use this method.

Parameters:

Xarray-like (device or host) shape = (n_samples, n_features): Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

Returns:

predscuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, n_classes)

Log of predicted probabilities

For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.

predict_proba(X, *, log=False) → CumlArray[source]#

Predicts the class probabilities for X.

The model has to be trained with probability=True to use this method.

Parameters:

Xarray-like (device or host) shape = (n_samples, n_features): Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
lean (default = False): Whether to return log probabilities.

Returns:

predscuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, n_classes)

Predicted probabilities

For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.

property support_#: Python descriptor object to control getting/setting CumlArray attributes on Base objects. See the Estimator Guide for an in depth guide.