ARIMA#

class cuml.tsa.ARIMA(endog, *, order: Tuple[int, int, int] = (1, 1, 1), seasonal_order: Tuple[int, int, int, int] = (0, 0, 0, 0), exog=None, fit_intercept=True, simple_differencing=True, verbose=False, output_type=None, convert_dtype=True)#

Implements a batched ARIMA model for in- and out-of-sample time-series prediction, with support for seasonality (SARIMA)

ARIMA stands for Auto-Regressive Integrated Moving Average. See https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average

This class can fit an ARIMA(p,d,q) or ARIMA(p,d,q)(P,D,Q)_s model to a batch of time series of the same length (or various lengths, using missing values at the start for padding). The implementation is designed to give the best performance when using large batches of time series.

Parameters:

endogdataframe or array-like (device or host): Endogenous variable, assumed to have each time series in columns. Acceptable formats: cuDF DataFrame, cuDF Series, NumPy ndarray, Numba device ndarray, cuda array interface compliant array like CuPy. Missing values are accepted, represented by NaN.
orderTuple[int, int, int] (default=(1,1,1)): The ARIMA order (p, d, q) of the model
seasonal_orderTuple[int, int, int, int] (default=(0,0,0,0)): The seasonal ARIMA order (P, D, Q, s) of the model
exogdataframe or array-like (device or host) (default=None): Exogenous variables, assumed to have each time series in columns, such that variables associated with a same batch member are adjacent (number of columns: n_exog * batch_size) Acceptable formats: cuDF DataFrame, cuDF Series, NumPy ndarray, Numba device ndarray, cuda array interface compliant array like CuPy. Missing values are not supported.
fit_interceptbool or int (default = True): Whether to include a constant trend mu in the model
simple_differencingbool or int (default = True): If True, the data is differenced before being passed to the Kalman filter. If False, differencing is part of the state-space model. In some cases this setting can be ignored: computing forecasts with confidence intervals will force it to False ; fitting with the CSS method will force it to True. Note: that forecasts are always for the original series, whereas statsmodels computes forecasts for the differenced series when simple_differencing is True.
verboseint or boolean, default=False: Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.
output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None: Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.
convert_dtypeboolean: When set to True, the model will automatically convert the inputs to np.float64.

Attributes:

orderARIMAOrder: The ARIMA order of the model (p, d, q, P, D, Q, s, k, n_exog)
d_ydevice array: Time series data on device
n_obsint: Number of observations
batch_sizeint: Number of time series in the batch
dtypenumpy.dtype: Floating-point type of the data and parameters
niternumpy.ndarray: After fitting, contains the number of iterations before convergence for each time series.

Methods

`fit`(self, start_params, object]] = None, ...)	Fit the ARIMA model to each time series.
`forecast`(self, int nsteps[, level, exog])	Forecast the given model `nsteps` into the future.
`get_fit_params`(self)	Get all the fit parameters.
`get_params`(self[, deep])	ARIMA is unable to be cloned at this time.
`pack`(self)	Pack parameters of the model into a linearized vector `x`
`predict`(self[, start, end, level, exog, ...])	Compute in-sample and/or out-of-sample prediction for each series
`set_fit_params`(self, params[, convert_dtype])	Set all the fit parameters.
`set_params`(self, **params)	ARIMA is unable to be cloned at this time.
`unpack`(self, x[, convert_dtype])	Unpack linearized parameter vector `x` into the separate parameter arrays of the model

Notes

Performance: Let \(r=max(p+s*P, q+s*Q+1)\). The device memory used for most operations is :math: O(mathtt{batch_size}*mathtt{n_obs} + mathtt{batch_size}*r^2). The execution time is a linear function of n_obs and batch_size (if batch_size is large), but grows very fast with r.

The performance is optimized for very large batch sizes (e.g thousands of series).

References

This class is heavily influenced by the Python library statsmodels, particularly statsmodels.tsa.statespace.sarimax.SARIMAX. See https://www.statsmodels.org/stable/statespace.html.

Additionally the following book is a useful reference: “Time Series Analysis by State Space Methods”, J. Durbin, S.J. Koopman, 2nd Edition (2012).

Examples

>>> import cupy as cp
>>> from cuml.tsa.arima import ARIMA

>>> # Create seasonal data with a trend, a seasonal pattern and noise
>>> n_obs = 100
>>> cp.random.seed(12)
>>> x = cp.linspace(0, 1, n_obs)
>>> pattern = cp.array([[0.05, 0.0], [0.07, 0.03],
...                     [-0.03, 0.05], [0.02, 0.025]])
>>> noise = cp.random.normal(scale=0.01, size=(n_obs, 2))
>>> y = (cp.column_stack((0.5*x, -0.25*x)) + noise
...     + cp.tile(pattern, (25, 1)))

>>> # Fit a seasonal ARIMA model
>>> model = ARIMA(y,
...               order=(0,1,1),
...               seasonal_order=(0,1,1,4),
...               fit_intercept=False)
>>> model.fit()
ARIMA(...)
>>> # Forecast
>>> fc = model.forecast(10)
>>> print(fc)
[[ 0.55204599 -0.25681163]
[ 0.57430705 -0.2262438 ]
[ 0.48120315 -0.20583011]
[ 0.535594   -0.24060046]
[ 0.57207541 -0.26695497]
[ 0.59433647 -0.23638713]
[ 0.50123257 -0.21597344]
[ 0.55562342 -0.25074379]
[ 0.59210483 -0.27709831]
[ 0.61436589 -0.24653047]]

ARIMA.aic -> CumlArray: Akaike Information Criterion

ARIMA.aicc -> CumlArray: Corrected Akaike Information Criterion

ARIMA.bic -> CumlArray: Bayesian Information Criterion

property complexity#: Model complexity (number of parameters)

fit(self, start_params: Optional[Mapping[str, object]]=None, int opt_disp: int = -1, double h: float = 1e-8, int maxiter: int = 1000, method='ml', int truncate: int = 0, bool convert_dtype: bool = True) → 'ARIMA'[source]#

Fit the ARIMA model to each time series.

Parameters:

start_paramsMapping[str, array-like] (optional)

A mapping (e.g dictionary) of parameter names and associated arrays The key names are in {“mu”, “ar”, “ma”, “sar”, “sma”, “sigma2”} The shape of the arrays are (batch_size,) for mu and sigma2 parameters and (n, batch_size) for any other type, where n is the corresponding number of parameters of this type. Pass None for automatic estimation (recommended)

opt_dispint

Fit diagnostic level (for L-BFGS solver):

-1 for no output (default)
0<n<100 for output every n steps
n>100 for more detailed output

hfloat (default=1e-8)

Finite-differencing step size. The gradient is computed using forward finite differencing: \(g = \frac{f(x + \mathtt{h}) - f(x)}{\mathtt{h}} + O(\mathtt{h})\)

maxiterint (default=1000)

Maximum number of iterations of L-BFGS-B

methodstr (default=”ml”)

Estimation method - “css”, “css-ml” or “ml”. CSS uses a sum-of-squares approximation. ML estimates the log-likelihood with statespace methods. CSS-ML starts with CSS and refines with ML.

truncateint (default=0)

When using CSS, start the sum of squares after a given number of observations

forecast(self, int nsteps: int, level=None, exog=None) → CumlArray | Tuple[CumlArray, CumlArray, CumlArray][source]#

Forecast the given model nsteps into the future.

Parameters:

nstepsint: The number of steps to forecast beyond end of the given series
levelfloat or None (default = None): Confidence level for prediction intervals, or None to return only the point forecasts. 0 < level < 1
exogdataframe or array-like (device or host) (default=None): Future values for exogenous variables. Assumed to have each time series in columns, such that variables associated with a same batch member are adjacent. Shape = (nsteps, n_exog * batch_size)

Returns:

y_fcarray-like: Forecasts. Shape = (nsteps, batch_size)
lowerarray-like (device) (optional): Lower limit of the prediction interval if level != None Shape = (end - start, batch_size)
upperarray-like (device) (optional): Upper limit of the prediction interval if level != None Shape = (end - start, batch_size)

Examples

from cuml.tsa.arima import ARIMA
...
model = ARIMA(ys, order=(1,1,1))
model.fit()
y_fc = model.forecast(10)

get_fit_params(self) → Dict[str, CumlArray][source]#

Get all the fit parameters. Not to be confused with get_params Note: pack() can be used to get a compact vector of the parameters

Returns:

params: Dict[str, array-like]: A dictionary of parameter names and associated arrays The key names are in {“mu”, “ar”, “ma”, “sar”, “sma”, “sigma2”} The shape of the arrays are (batch_size,) for mu and sigma2 and (n, batch_size) for any other type, where n is the corresponding number of parameters of this type.

get_params(self, deep=True)[source]#: ARIMA is unable to be cloned at this time. The methods: _get_param_names(), get_params and set_params will raise NotImplementedError

property llf#: Log-likelihood of a fit model. Shape: (batch_size,)

pack(self) → np.ndarray[source]#

Pack parameters of the model into a linearized vector x

Returns:

xnumpy ndarray: Packed parameter array, grouped by series. Shape: (n_params * batch_size,)

predict(self, start=0, end=None, level=None, exog=None, convert_dtype=True) → CumlArray | Tuple[CumlArray, CumlArray, CumlArray][source]#

Compute in-sample and/or out-of-sample prediction for each series

Parameters:

startint (default = 0): Index where to start the predictions (0 <= start <= num_samples)
endint (default = None): Index where to end the predictions, excluded (end > start), or None to predict until the last observation
levelfloat or None (default = None): Confidence level for prediction intervals, or None to return only the point forecasts. 0 < level < 1
exogdataframe or array-like (device or host): Future values for exogenous variables. Assumed to have each time series in columns, such that variables associated with a same batch member are adjacent. Shape = (end - n_obs, n_exog * batch_size)

Returns:

y_parray-like (device): Predictions. Shape = (end - start, batch_size)
lower: array-like (device) (optional): Lower limit of the prediction interval if level != None Shape = (end - start, batch_size)
upper: array-like (device) (optional): Upper limit of the prediction interval if level != None Shape = (end - start, batch_size)

Examples

from cuml.tsa.arima import ARIMA

model = ARIMA(ys, order=(1,1,1))
model.fit()
y_pred = model.predict()

set_fit_params(self, params: Mapping[str, object], convert_dtype=True)[source]#

Set all the fit parameters. Not to be confused with set_params Note: unpack() can be used to load a compact vector of the parameters

Parameters:

params: Mapping[str, array-like]: A dictionary of parameter names and associated arrays The key names are in {“mu”, “ar”, “ma”, “sar”, “sma”, “sigma2”} The shape of the arrays are (batch_size,) for mu and sigma2 and (n, batch_size) for any other type, where n is the corresponding number of parameters of this type.

set_params(self, **params)[source]#: ARIMA is unable to be cloned at this time. The methods: _get_param_names(), get_params and set_params will raise NotImplementedError

unpack(self, x: list | np.ndarray, convert_dtype=True)[source]#

Unpack linearized parameter vector x into the separate parameter arrays of the model

Parameters:

xarray-like: Packed parameter array, grouped by series. Shape: (n_params * batch_size,)