SVR#
- class cuml.svm.SVR(Epsilon Support Vector Regression)[source]#
- Parameters:
- Cfloat (default = 1.0)
Penalty parameter C
- kernelstring (default=’rbf’)
Specifies the kernel function. Possible options: ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’. When using ‘precomputed’, X is expected to be a precomputed kernel matrix of shape (n_samples, n_samples) at fit time, and (n_samples_test, n_samples_train) at predict time. A valid kernel matrix should be symmetric and positive semi-definite; cuML does not validate these properties.
- degreeint (default=3)
Degree of polynomial kernel function.
- gammafloat or string (default = ‘scale’)
Coefficient for rbf, poly, and sigmoid kernels. You can specify the numeric value, or use one of the following options:
‘auto’: gamma will be set to
1 / n_features‘scale’: gamma will be se to
1 / (n_features * X.var())
- coef0float (default = 0.0)
Independent term in kernel function, only significant for poly and sigmoid
- tolfloat (default = 1e-3)
Tolerance for stopping criterion.
- epsilon: float (default = 0.1)
epsilon parameter of the epsilon-SVR model. There is no penalty associated to points that are predicted within the epsilon-tube around the target values.
- cache_sizefloat (default = 1024.0)
Size of the kernel cache during training in MiB. Increase it to improve the training time, at the cost of higher memory footprint. After training the kernel cache is deallocated. During prediction, we also need a temporary space to store kernel matrix elements (this can be significant if n_support is large). The cache_size variable sets an upper limit to the prediction buffer as well.
- max_iterint (default = -1)
Limit the number of total iterations in the solver. Default of -1 for “no limit”.
- nochange_stepsint (default = 1000)
We monitor how much our stopping criteria changes during outer iterations. If it does not change (changes less then 1e-3*tol) for nochange_steps consecutive steps, then we stop training.
- verboseint or boolean, default=False
Sets logging level. It must be one of
cuml.common.logger.level_*. See Verbosity Levels for more info.- output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None
Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (
cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.
- Attributes:
- n_support_int
The total number of support vectors. Note: this will change in the future to represent number support vectors for each class (like in Sklearn, see Issue #956)
- support_int, shape = [n_support]
Device array of support vector indices
- support_vectors_float, shape [n_support, n_cols]
Device array of support vectors. For kernel=’precomputed’, this attribute is empty (shape [0, 0]) since the original feature vectors are not available.
- dual_coef_float, shape = [1, n_support]
Device array of coefficients for support vectors
- intercept_int
The constant in the decision function
- fit_status_int
0 if SVM is correctly fitted
- n_iter_int
Number of outer iterations run by the solver.
coef_float, shape [1, n_cols]SVMBase.coef_(self)
Methods
fit(X, y[, sample_weight, convert_dtype])Fit the model with X and y.
predict(X, *[, convert_dtype])Predicts the values for X.
Notes
For additional docs, see Scikit-learn’s SVR.
The solver uses the SMO method to fit the regressor. We use the Optimized Hierarchical Decomposition [1] variant of the SMO algorithm, similar to [2]
References
[1]J. Vanek et al. A GPU-Architecture Optimized Hierarchical Decomposition Algorithm for Support VectorMachine Training, IEEE Transactions on Parallel and Distributed Systems, vol 28, no 12, 3330, (2017)
Examples
>>> import cupy as cp >>> from cuml.svm import SVR >>> X = cp.array([[1], [2], [3], [4], [5]], dtype=cp.float32) >>> y = cp.array([1.1, 4, 5, 3.9, 1.], dtype = cp.float32) >>> reg = SVR(kernel='rbf', gamma='scale', C=10, epsilon=0.1) >>> reg.fit(X, y) SVR() >>> print("Predicted values:", reg.predict(X)) Predicted values: [1.200474 3.8999617 5.100488 3.7995374 1.0995375]
- fit(X, y, sample_weight=None, *, convert_dtype=True) SVR[source]#
Fit the model with X and y.
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- yarray-like (device or host) shape = (n_samples, 1)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- sample_weightarray-like (device or host) shape = (n_samples,), default=None
The weights for each observation in X. If None, all observations are assigned equal weight. Acceptable dense formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the train method will, when necessary, convert y to be the same data type as X if they differ. This will increase memory used for the method.
- predict(X, *, convert_dtype=True) CumlArray[source]#
Predicts the values for X.
For precomputed kernels, X should be a kernel matrix of shape (n_samples_test, n_samples_train) where n_samples_train is the number of samples used during fit.
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the predict method will, when necessary, convert the input to the data type which was used to train the model. This will increase memory used for the method.
- Returns:
- predscuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, 1)
Predicted values
For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.