cudf.DataFrame.rolling#

DataFrame.rolling(window, min_periods=None, center=False, axis=0, win_type=None)#

Rolling window calculations.

Parameters:
windowint, offset or a BaseIndexer subclass

Size of the window, i.e., the number of observations used to calculate the statistic. For datetime indexes, an offset can be provided instead of an int. The offset must be convertible to a timedelta. As opposed to a fixed window size, each window will be sized to accommodate observations within the time period specified by the offset. If a BaseIndexer subclass is passed, calculates the window boundaries based on the defined get_window_bounds method.

min_periodsint, optional

The minimum number of observations in the window that are required to be non-null, so that the result is non-null. If not provided or None, min_periods is equal to the window size.

centerbool, optional

If True, the result is set at the center of the window. If False (default), the result is set at the right edge of the window.

Returns:
Rolling object.

Examples

>>> import cudf
>>> a = cudf.Series([1, 2, 3, None, 4])

Rolling sum with window size 2.

>>> print(a.rolling(2).sum())
0
1    3
2    5
3
4
dtype: int64

Rolling sum with window size 2 and min_periods 1.

>>> print(a.rolling(2, min_periods=1).sum())
0    1
1    3
2    5
3    3
4    4
dtype: int64

Rolling count with window size 3.

>>> print(a.rolling(3).count())
0    1
1    2
2    3
3    2
4    2
dtype: int64

Rolling count with window size 3, but with the result set at the center of the window.

>>> print(a.rolling(3, center=True).count())
0    2
1    3
2    2
3    2
4    1 dtype: int64

Rolling max with variable window size specified by an offset; only valid for datetime index.

>>> a = cudf.Series(
...     [1, 9, 5, 4, np.nan, 1],
...     index=[
...         pd.Timestamp('20190101 09:00:00'),
...         pd.Timestamp('20190101 09:00:01'),
...         pd.Timestamp('20190101 09:00:02'),
...         pd.Timestamp('20190101 09:00:04'),
...         pd.Timestamp('20190101 09:00:07'),
...         pd.Timestamp('20190101 09:00:08')
...     ]
... )
>>> print(a.rolling('2s').max())
2019-01-01T09:00:00.000    1
2019-01-01T09:00:01.000    9
2019-01-01T09:00:02.000    9
2019-01-01T09:00:04.000    4
2019-01-01T09:00:07.000
2019-01-01T09:00:08.000    1
dtype: int64

Apply custom function on the window with the apply method

>>> import numpy as np
>>> import math
>>> b = cudf.Series([16, 25, 36, 49, 64, 81], dtype=np.float64)
>>> def some_func(A):
...     b = 0
...     for a in A:
...         b = b + math.sqrt(a)
...     return b
...
>>> print(b.rolling(3, min_periods=1).apply(some_func))
0     4.0
1     9.0
2    15.0
3    18.0
4    21.0
5    24.0
dtype: float64

And this also works for window rolling set by an offset

>>> import pandas as pd
>>> c = cudf.Series(
...     [16, 25, 36, 49, 64, 81],
...     index=[
...          pd.Timestamp('20190101 09:00:00'),
...          pd.Timestamp('20190101 09:00:01'),
...          pd.Timestamp('20190101 09:00:02'),
...          pd.Timestamp('20190101 09:00:04'),
...          pd.Timestamp('20190101 09:00:07'),
...          pd.Timestamp('20190101 09:00:08')
...      ],
...     dtype=np.float64
... )
>>> print(c.rolling('2s').apply(some_func))
2019-01-01T09:00:00.000     4.0
2019-01-01T09:00:01.000     9.0
2019-01-01T09:00:02.000    11.0
2019-01-01T09:00:04.000     7.0
2019-01-01T09:00:07.000     8.0
2019-01-01T09:00:08.000    17.0
dtype: float64