Files | Classes | Functions
Rolling Window

Files

file  rolling.hpp
 

Classes

struct  cudf::window_bounds
 Abstraction for window boundary sizes. More...
 

Functions

std::unique_ptr< columncudf::rolling_window (column_view const &input, size_type preceding_window, size_type following_window, size_type min_periods, std::unique_ptr< aggregation > const &agg, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Applies a fixed-size rolling window function to the values in a column. More...
 
std::unique_ptr< columncudf::rolling_window (column_view const &input, column_view const &default_outputs, size_type preceding_window, size_type following_window, size_type min_periods, std::unique_ptr< aggregation > const &agg, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 rolling_window( More...
 
std::unique_ptr< columncudf::grouped_rolling_window (table_view const &group_keys, column_view const &input, size_type preceding_window, size_type following_window, size_type min_periods, std::unique_ptr< aggregation > const &aggr, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Applies a grouping-aware, fixed-size rolling window function to the values in a column. More...
 
std::unique_ptr< columncudf::grouped_rolling_window (table_view const &group_keys, column_view const &input, window_bounds preceding_window, window_bounds following_window, size_type min_periods, std::unique_ptr< aggregation > const &aggr, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 grouped_rolling_window( More...
 
std::unique_ptr< columncudf::grouped_rolling_window (table_view const &group_keys, column_view const &input, column_view const &default_outputs, size_type preceding_window, size_type following_window, size_type min_periods, std::unique_ptr< aggregation > const &aggr, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 grouped_rolling_window( More...
 
std::unique_ptr< columncudf::grouped_rolling_window (table_view const &group_keys, column_view const &input, column_view const &default_outputs, window_bounds preceding_window, window_bounds following_window, size_type min_periods, std::unique_ptr< aggregation > const &aggr, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 grouped_rolling_window( More...
 
std::unique_ptr< columncudf::grouped_time_range_rolling_window (table_view const &group_keys, column_view const &timestamp_column, cudf::order const &timestamp_order, column_view const &input, size_type preceding_window_in_days, size_type following_window_in_days, size_type min_periods, std::unique_ptr< aggregation > const &aggr, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Applies a grouping-aware, timestamp-based rolling window function to the values in a column. More...
 
std::unique_ptr< columncudf::grouped_time_range_rolling_window (table_view const &group_keys, column_view const &timestamp_column, cudf::order const &timestamp_order, column_view const &input, window_bounds preceding_window_in_days, window_bounds following_window_in_days, size_type min_periods, std::unique_ptr< aggregation > const &aggr, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 grouped_time_range_rolling_window( More...
 
std::unique_ptr< columncudf::rolling_window (column_view const &input, column_view const &preceding_window, column_view const &following_window, size_type min_periods, std::unique_ptr< aggregation > const &agg, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Applies a variable-size rolling window function to the values in a column. More...
 

Detailed Description

Function Documentation

◆ grouped_rolling_window() [1/4]

std::unique_ptr<column> cudf::grouped_rolling_window ( table_view const &  group_keys,
column_view const &  input,
column_view const &  default_outputs,
size_type  preceding_window,
size_type  following_window,
size_type  min_periods,
std::unique_ptr< aggregation > const &  aggr,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

grouped_rolling_window(

grouped_rolling_window( table_view const& group_keys, column_view const& input, size_type preceding_window, size_type following_window, size_type min_periods, std::unique_ptr<aggregation> const& aggr, rmm::mr::device_memory_resource* mr)

Parameters
default_outputsA column of per-row default values to be returned instead of nulls. Used for LEAD()/LAG(), if the row offset crosses the boundaries of the column or group.

◆ grouped_rolling_window() [2/4]

std::unique_ptr<column> cudf::grouped_rolling_window ( table_view const &  group_keys,
column_view const &  input,
column_view const &  default_outputs,
window_bounds  preceding_window,
window_bounds  following_window,
size_type  min_periods,
std::unique_ptr< aggregation > const &  aggr,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

grouped_rolling_window(

grouped_rolling_window( table_view const& group_keys, column_view const& input, column_view const& default_outputs, size_type preceding_window, size_type following_window, size_type min_periods, std::unique_ptr<aggregation> const& aggr, rmm::mr::device_memory_resource* mr)

◆ grouped_rolling_window() [3/4]

std::unique_ptr<column> cudf::grouped_rolling_window ( table_view const &  group_keys,
column_view const &  input,
size_type  preceding_window,
size_type  following_window,
size_type  min_periods,
std::unique_ptr< aggregation > const &  aggr,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Applies a grouping-aware, fixed-size rolling window function to the values in a column.

Like rolling_window(), this function aggregates values in a window around each element of a specified input column. It differs from rolling_window() in that elements of the input column are grouped into distinct groups (e.g. the result of a groupby). The window aggregation cannot cross the group boundaries. For a row i of input, the group is determined from the corresponding (i.e. i-th) values of the columns under group_keys.

Note: This method requires that the rows are presorted by the group_key values.

Example: Consider a user-sales dataset, where the rows look as follows:
{ "user_id", sales_amt, day }
The `grouped_rolling_window()` method enables windowing queries such as grouping a dataset by
`user_id`, and summing up the `sales_amt` column over a window of 3 rows (2 preceding (including
current row), 1 row following).
In this example,
1. `group_keys == [ user_id ]`
2. `input == sales_amt`
The data are grouped by `user_id`, and ordered by `day`-string. The aggregation
(SUM) is then calculated for a window of 3 values around (and including) each row.
For the following input:
[ // user, sales_amt
{ "user1", 10 },
{ "user2", 20 },
{ "user1", 20 },
{ "user1", 10 },
{ "user2", 30 },
{ "user2", 80 },
{ "user1", 50 },
{ "user1", 60 },
{ "user2", 40 }
]
Partitioning (grouping) by `user_id` yields the following `sales_amt` vector
(with 2 groups, one for each distinct `user_id`):
[ 10, 20, 10, 50, 60, 20, 30, 80, 40 ]
<-------user1-------->|<------user2------->
The SUM aggregation is applied with 1 preceding and 1 following
row, with a minimum of 1 period. The aggregation window is thus 3 rows wide,
yielding the following column:
[ 30, 40, 80, 120, 110, 50, 130, 150, 120 ]
Note: The SUMs calculated at the group boundaries (i.e. indices 0, 4, 5, and 8)
consider only 2 values each, in spite of the window-size being 3.
Each aggregation operation cannot cross group boundaries.

The returned column for op == COUNT always has INT32 type. All other operators return a column of the same type as the input. Therefore it is suggested to convert integer column types (especially low-precision integers) to FLOAT32 or FLOAT64 before doing a rolling MEAN.

Parameters
[in]group_keysThe (pre-sorted) grouping columns
[in]inputThe input column (to be aggregated)
[in]preceding_windowThe static rolling window size in the backward direction.
[in]following_windowThe static rolling window size in the forward direction.
[in]min_periodsMinimum number of observations in window required to have a value, otherwise element i is null.
[in]aggrThe rolling window aggregation type (SUM, MAX, MIN, etc.)
Returns
A nullable output column containing the rolling window results

◆ grouped_rolling_window() [4/4]

std::unique_ptr<column> cudf::grouped_rolling_window ( table_view const &  group_keys,
column_view const &  input,
window_bounds  preceding_window,
window_bounds  following_window,
size_type  min_periods,
std::unique_ptr< aggregation > const &  aggr,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

grouped_rolling_window(

grouped_rolling_window( table_view const& group_keys, column_view const& input, size_type preceding_window, size_type following_window, size_type min_periods, std::unique_ptr<aggregation> const& aggr, rmm::mr::device_memory_resource* mr)

◆ grouped_time_range_rolling_window() [1/2]

std::unique_ptr<column> cudf::grouped_time_range_rolling_window ( table_view const &  group_keys,
column_view const &  timestamp_column,
cudf::order const &  timestamp_order,
column_view const &  input,
size_type  preceding_window_in_days,
size_type  following_window_in_days,
size_type  min_periods,
std::unique_ptr< aggregation > const &  aggr,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Applies a grouping-aware, timestamp-based rolling window function to the values in a column.

Like rolling_window(), this function aggregates values in a window around each element of a specified input column. It differs from rolling_window() in two respects:

  1. The elements of the input column are grouped into distinct groups (e.g. the result of a groupby), determined by the corresponding values of the columns under group_keys. The window-aggregation cannot cross the group boundaries.
  2. Within a group, the aggregation window is calculated based on a time interval (e.g. number of days preceding/following the current row). The timestamps for the input data are specified by the timestamp_column argument.

Note: This method requires that the rows are presorted by the group keys and timestamp values.

Example: Consider a user-sales dataset, where the rows look as follows:
{ "user_id", sales_amt, date }
This method enables windowing queries such as grouping a dataset by `user_id`, sorting by
increasing `date`, and summing up the `sales_amt` column over a window of 3 days (1 preceding
*day, the current day, and 1 following day).
In this example,
1. `group_keys == [ user_id ]`
2. `timestamp_column == date`
3. `input == sales_amt`
The data are grouped by `user_id`, and ordered by `date`. The aggregation
(SUM) is then calculated for a window of 3 days around (and including) each row.
For the following input:
[ // user, sales_amt, YYYYMMDD (date)
{ "user1", 10, 20200101 },
{ "user2", 20, 20200101 },
{ "user1", 20, 20200102 },
{ "user1", 10, 20200103 },
{ "user2", 30, 20200101 },
{ "user2", 80, 20200102 },
{ "user1", 50, 20200107 },
{ "user1", 60, 20200107 },
{ "user2", 40, 20200104 }
]
Partitioning (grouping) by `user_id`, and ordering by `date` yields the following `sales_amt`
vector (with 2 groups, one for each distinct `user_id`):
Date :(202001-) [ 01, 02, 03, 07, 07, 01, 01, 02, 04 ]
Input: [ 10, 20, 10, 50, 60, 20, 30, 80, 40 ]
<-------user1-------->|<---------user2--------->
The SUM aggregation is applied, with 1 day preceding, and 1 day following, with a minimum of 1
period. The aggregation window is thus 3 *days* wide, yielding the following output column:
Results: [ 30, 40, 30, 110, 110, 130, 130, 130, 40 ]

Note: The number of rows participating in each window might vary, based on the index within the group, datestamp, and min_periods. Apropos:

  1. results[0] considers 2 values, because it is at the beginning of its group, and has no preceding values.
  2. results[5] considers 3 values, despite being at the beginning of its group. It must include 2 following values, based on its datestamp.

Each aggregation operation cannot cross group boundaries.

The returned column for op == COUNT always has INT32 type. All other operators return a column of the same type as the input. Therefore it is suggested to convert integer column types (especially low-precision integers) to FLOAT32 or FLOAT64 before doing a rolling MEAN.

Parameters
[in]group_keysThe (pre-sorted) grouping columns
[in]timestamp_columnThe (pre-sorted) timestamps for each row
[in]timestamp_orderThe order (ASCENDING/DESCENDING) in which the timestamps are sorted
[in]inputThe input column (to be aggregated)
[in]preceding_window_in_daysThe rolling window time-interval in the backward direction.
[in]following_window_in_daysThe rolling window time-interval in the forward direction.
[in]min_periodsMinimum number of observations in window required to have a value, otherwise element i is null.
[in]aggrThe rolling window aggregation type (SUM, MAX, MIN, etc.)
Returns
A nullable output column containing the rolling window results

◆ grouped_time_range_rolling_window() [2/2]

std::unique_ptr<column> cudf::grouped_time_range_rolling_window ( table_view const &  group_keys,
column_view const &  timestamp_column,
cudf::order const &  timestamp_order,
column_view const &  input,
window_bounds  preceding_window_in_days,
window_bounds  following_window_in_days,
size_type  min_periods,
std::unique_ptr< aggregation > const &  aggr,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

grouped_time_range_rolling_window(

grouped_time_range_rolling_window( table_view const& group_keys, column_view const& timestamp_column, cudf::order const& timestamp_order, column_view const& input, size_type preceding_window_in_days, size_type following_window_in_days, size_type min_periods, std::unique_ptr<aggregation> const& aggr, rmm::mr::device_memory_resource* mr)

◆ rolling_window() [1/3]

std::unique_ptr<column> cudf::rolling_window ( column_view const &  input,
column_view const &  default_outputs,
size_type  preceding_window,
size_type  following_window,
size_type  min_periods,
std::unique_ptr< aggregation > const &  agg,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

rolling_window(

rolling_window( column_view const& input, size_type preceding_window, size_type following_window, size_type min_periods, std::unique_ptr<aggregation> const& agg, rmm::mr::device_memory_resource* mr)

Parameters
default_outputsA column of per-row default values to be returned instead of nulls. Used for LEAD()/LAG(), if the row offset crosses the boundaries of the column.

◆ rolling_window() [2/3]

std::unique_ptr<column> cudf::rolling_window ( column_view const &  input,
column_view const &  preceding_window,
column_view const &  following_window,
size_type  min_periods,
std::unique_ptr< aggregation > const &  agg,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Applies a variable-size rolling window function to the values in a column.

This function aggregates values in a window around each element i of the input column, and invalidates the bit mask for element i if there are not enough observations. The window size is dynamic (varying for each element). This matches Pandas' API for DataFrame.rolling with a few notable differences:

  • instead of the center flag it uses a two-part window to allow for more flexible windows. The total window size = preceding_window + following_window. Element i uses elements [i-preceding_window+1, i+following_window] to do the window computation.
  • instead of storing NA/NaN for output rows that do not meet the minimum number of observations this function updates the valid bitmask of the column to indicate which elements are valid.
  • support for dynamic rolling windows, i.e. window size can be specified for each element using an additional array.

The returned column for count aggregation always has INT32 type. All other operators return a column of the same type as the input. Therefore it is suggested to convert integer column types (especially low-precision integers) to FLOAT32 or FLOAT64 before doing a rolling MEAN.

Exceptions
cudf::logic_errorif window column type is not INT32
Parameters
[in]input_colThe input column
[in]preceding_windowA non-nullable column of INT32 window sizes in the forward direction. preceding_window[i] specifies preceding window size for element i.
[in]following_windowA non-nullable column of INT32 window sizes in the backward direction. following_window[i] specifies following window size for element i.
[in]min_periodsMinimum number of observations in window required to have a value, otherwise element i is null.
[in]aggThe rolling window aggregation type (sum, max, min, etc.)
Returns
A nullable output column containing the rolling window results

◆ rolling_window() [3/3]

std::unique_ptr<column> cudf::rolling_window ( column_view const &  input,
size_type  preceding_window,
size_type  following_window,
size_type  min_periods,
std::unique_ptr< aggregation > const &  agg,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Applies a fixed-size rolling window function to the values in a column.

This function aggregates values in a window around each element i of the input column, and invalidates the bit mask for element i if there are not enough observations. The window size is static (the same for each element). This matches Pandas' API for DataFrame.rolling with a few notable differences:

  • instead of the center flag it uses a two-part window to allow for more flexible windows. The total window size = preceding_window + following_window. Element i uses elements [i-preceding_window+1, i+following_window] to do the window computation.
  • instead of storing NA/NaN for output rows that do not meet the minimum number of observations this function updates the valid bitmask of the column to indicate which elements are valid.

The returned column for count aggregation always has INT32 type. All other operators return a column of the same type as the input. Therefore it is suggested to convert integer column types (especially low-precision integers) to FLOAT32 or FLOAT64 before doing a rolling MEAN.

Parameters
[in]input_colThe input column
[in]preceding_windowThe static rolling window size in the backward direction.
[in]following_windowThe static rolling window size in the forward direction.
[in]min_periodsMinimum number of observations in window required to have a value, otherwise element i is null.
[in]aggThe rolling window aggregation type (SUM, MAX, MIN, etc.)
Returns
A nullable output column containing the rolling window results