ARIMA#
- class cuml.tsa.ARIMA(endog, *, order: Tuple[int, int, int] = (1, 1, 1), seasonal_order: Tuple[int, int, int, int] = (0, 0, 0, 0), exog=None, fit_intercept=True, simple_differencing=True, verbose=False, output_type=None, convert_dtype=True)#
Implements a batched ARIMA model for in- and out-of-sample time-series prediction, with support for seasonality (SARIMA)
ARIMA stands for Auto-Regressive Integrated Moving Average. See https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average
This class can fit an ARIMA(p,d,q) or ARIMA(p,d,q)(P,D,Q)_s model to a batch of time series of the same length (or various lengths, using missing values at the start for padding). The implementation is designed to give the best performance when using large batches of time series.
- Parameters:
- endogdataframe or array-like (device or host)
Endogenous variable, assumed to have each time series in columns. Acceptable formats: cuDF DataFrame, cuDF Series, NumPy ndarray, Numba device ndarray, cuda array interface compliant array like CuPy. Missing values are accepted, represented by NaN.
- orderTuple[int, int, int] (default=(1,1,1))
The ARIMA order (p, d, q) of the model
- seasonal_orderTuple[int, int, int, int] (default=(0,0,0,0))
The seasonal ARIMA order (P, D, Q, s) of the model
- exogdataframe or array-like (device or host) (default=None)
Exogenous variables, assumed to have each time series in columns, such that variables associated with a same batch member are adjacent (number of columns: n_exog * batch_size) Acceptable formats: cuDF DataFrame, cuDF Series, NumPy ndarray, Numba device ndarray, cuda array interface compliant array like CuPy. Missing values are not supported.
- fit_interceptbool or int (default = True)
Whether to include a constant trend mu in the model
- simple_differencingbool or int (default = True)
If True, the data is differenced before being passed to the Kalman filter. If False, differencing is part of the state-space model. In some cases this setting can be ignored: computing forecasts with confidence intervals will force it to False ; fitting with the CSS method will force it to True. Note: that forecasts are always for the original series, whereas statsmodels computes forecasts for the differenced series when simple_differencing is True.
- verboseint or boolean, default=False
Sets logging level. It must be one of
cuml.common.logger.level_*. See Verbosity Levels for more info.- output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None
Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (
cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.- convert_dtypeboolean
When set to True, the model will automatically convert the inputs to np.float64.
- Attributes:
- orderARIMAOrder
The ARIMA order of the model (p, d, q, P, D, Q, s, k, n_exog)
- d_ydevice array
Time series data on device
- n_obsint
Number of observations
- batch_sizeint
Number of time series in the batch
- dtypenumpy.dtype
Floating-point type of the data and parameters
- niternumpy.ndarray
After fitting, contains the number of iterations before convergence for each time series.
Methods
fit(self, start_params, object]] = None, ...)Fit the ARIMA model to each time series.
forecast(self, int nsteps[, level, exog])Forecast the given model
nstepsinto the future.get_fit_params(self)Get all the fit parameters.
get_params(self[, deep])ARIMA is unable to be cloned at this time.
pack(self)Pack parameters of the model into a linearized vector
xpredict(self[, start, end, level, exog, ...])Compute in-sample and/or out-of-sample prediction for each series
set_fit_params(self, params[, convert_dtype])Set all the fit parameters.
set_params(self, **params)ARIMA is unable to be cloned at this time.
unpack(self, x[, convert_dtype])Unpack linearized parameter vector
xinto the separate parameter arrays of the modelNotes
Performance: Let \(r=max(p+s*P, q+s*Q+1)\). The device memory used for most operations is :math:
O(mathtt{batch_size}*mathtt{n_obs} + mathtt{batch_size}*r^2). The execution time is a linear function ofn_obsandbatch_size(ifbatch_sizeis large), but grows very fast withr.The performance is optimized for very large batch sizes (e.g thousands of series).
References
This class is heavily influenced by the Python library
statsmodels, particularlystatsmodels.tsa.statespace.sarimax.SARIMAX. See https://www.statsmodels.org/stable/statespace.html.Additionally the following book is a useful reference: “Time Series Analysis by State Space Methods”, J. Durbin, S.J. Koopman, 2nd Edition (2012).
Examples
>>> import cupy as cp >>> from cuml.tsa.arima import ARIMA >>> # Create seasonal data with a trend, a seasonal pattern and noise >>> n_obs = 100 >>> cp.random.seed(12) >>> x = cp.linspace(0, 1, n_obs) >>> pattern = cp.array([[0.05, 0.0], [0.07, 0.03], ... [-0.03, 0.05], [0.02, 0.025]]) >>> noise = cp.random.normal(scale=0.01, size=(n_obs, 2)) >>> y = (cp.column_stack((0.5*x, -0.25*x)) + noise ... + cp.tile(pattern, (25, 1))) >>> # Fit a seasonal ARIMA model >>> model = ARIMA(y, ... order=(0,1,1), ... seasonal_order=(0,1,1,4), ... fit_intercept=False) >>> model.fit() ARIMA(...) >>> # Forecast >>> fc = model.forecast(10) >>> print(fc) [[ 0.55204599 -0.25681163] [ 0.57430705 -0.2262438 ] [ 0.48120315 -0.20583011] [ 0.535594 -0.24060046] [ 0.57207541 -0.26695497] [ 0.59433647 -0.23638713] [ 0.50123257 -0.21597344] [ 0.55562342 -0.25074379] [ 0.59210483 -0.27709831] [ 0.61436589 -0.24653047]]
- ARIMA.aic -> CumlArray
Akaike Information Criterion
- ARIMA.aicc -> CumlArray
Corrected Akaike Information Criterion
- ARIMA.bic -> CumlArray
Bayesian Information Criterion
- property complexity#
Model complexity (number of parameters)
- fit(self, start_params: Optional[Mapping[str, object]]=None, int opt_disp: int = -1, double h: float = 1e-8, int maxiter: int = 1000, method='ml', int truncate: int = 0, bool convert_dtype: bool = True) 'ARIMA'[source]#
Fit the ARIMA model to each time series.
- Parameters:
- start_paramsMapping[str, array-like] (optional)
A mapping (e.g dictionary) of parameter names and associated arrays The key names are in {“mu”, “ar”, “ma”, “sar”, “sma”, “sigma2”} The shape of the arrays are (batch_size,) for mu and sigma2 parameters and (n, batch_size) for any other type, where n is the corresponding number of parameters of this type. Pass None for automatic estimation (recommended)
- opt_dispint
Fit diagnostic level (for L-BFGS solver):
-1for no output (default)0<n<100for output everynstepsn>100for more detailed output
- hfloat (default=1e-8)
Finite-differencing step size. The gradient is computed using forward finite differencing: \(g = \frac{f(x + \mathtt{h}) - f(x)}{\mathtt{h}} + O(\mathtt{h})\)
- maxiterint (default=1000)
Maximum number of iterations of L-BFGS-B
- methodstr (default=”ml”)
Estimation method - “css”, “css-ml” or “ml”. CSS uses a sum-of-squares approximation. ML estimates the log-likelihood with statespace methods. CSS-ML starts with CSS and refines with ML.
- truncateint (default=0)
When using CSS, start the sum of squares after a given number of observations
- forecast(self, int nsteps: int, level=None, exog=None) CumlArray | Tuple[CumlArray, CumlArray, CumlArray][source]#
Forecast the given model
nstepsinto the future.- Parameters:
- nstepsint
The number of steps to forecast beyond end of the given series
- levelfloat or None (default = None)
Confidence level for prediction intervals, or None to return only the point forecasts. 0 < level < 1
- exogdataframe or array-like (device or host) (default=None)
Future values for exogenous variables. Assumed to have each time series in columns, such that variables associated with a same batch member are adjacent. Shape = (nsteps, n_exog * batch_size)
- Returns:
- y_fcarray-like
Forecasts. Shape = (nsteps, batch_size)
- lowerarray-like (device) (optional)
Lower limit of the prediction interval if level != None Shape = (end - start, batch_size)
- upperarray-like (device) (optional)
Upper limit of the prediction interval if level != None Shape = (end - start, batch_size)
Examples
from cuml.tsa.arima import ARIMA ... model = ARIMA(ys, order=(1,1,1)) model.fit() y_fc = model.forecast(10)
- get_fit_params(self) Dict[str, CumlArray][source]#
Get all the fit parameters. Not to be confused with get_params Note: pack() can be used to get a compact vector of the parameters
- Returns:
- params: Dict[str, array-like]
A dictionary of parameter names and associated arrays The key names are in {“mu”, “ar”, “ma”, “sar”, “sma”, “sigma2”} The shape of the arrays are (batch_size,) for mu and sigma2 and (n, batch_size) for any other type, where n is the corresponding number of parameters of this type.
- get_params(self, deep=True)[source]#
ARIMA is unable to be cloned at this time. The methods:
_get_param_names(),get_paramsandset_paramswill raiseNotImplementedError
- property llf#
Log-likelihood of a fit model. Shape: (batch_size,)
- pack(self) np.ndarray[source]#
Pack parameters of the model into a linearized vector
x- Returns:
- xnumpy ndarray
Packed parameter array, grouped by series. Shape: (n_params * batch_size,)
- predict(self, start=0, end=None, level=None, exog=None, convert_dtype=True) CumlArray | Tuple[CumlArray, CumlArray, CumlArray][source]#
Compute in-sample and/or out-of-sample prediction for each series
- Parameters:
- startint (default = 0)
Index where to start the predictions (0 <= start <= num_samples)
- endint (default = None)
Index where to end the predictions, excluded (end > start), or
Noneto predict until the last observation- levelfloat or None (default = None)
Confidence level for prediction intervals, or None to return only the point forecasts.
0 < level < 1- exogdataframe or array-like (device or host)
Future values for exogenous variables. Assumed to have each time series in columns, such that variables associated with a same batch member are adjacent. Shape = (end - n_obs, n_exog * batch_size)
- Returns:
- y_parray-like (device)
Predictions. Shape = (end - start, batch_size)
- lower: array-like (device) (optional)
Lower limit of the prediction interval if
level != NoneShape = (end - start, batch_size)- upper: array-like (device) (optional)
Upper limit of the prediction interval if
level != NoneShape = (end - start, batch_size)
Examples
from cuml.tsa.arima import ARIMA model = ARIMA(ys, order=(1,1,1)) model.fit() y_pred = model.predict()
- set_fit_params(self, params: Mapping[str, object], convert_dtype=True)[source]#
Set all the fit parameters. Not to be confused with
set_paramsNote:unpack()can be used to load a compact vector of the parameters- Parameters:
- params: Mapping[str, array-like]
A dictionary of parameter names and associated arrays The key names are in {“mu”, “ar”, “ma”, “sar”, “sma”, “sigma2”} The shape of the arrays are (batch_size,) for mu and sigma2 and (n, batch_size) for any other type, where n is the corresponding number of parameters of this type.
- set_params(self, **params)[source]#
ARIMA is unable to be cloned at this time. The methods:
_get_param_names(),get_paramsandset_paramswill raiseNotImplementedError