Package ai.rapids.cudf
Class Table.GroupByOperation
java.lang.Object
ai.rapids.cudf.Table.GroupByOperation
- Enclosing class:
- Table
Class representing groupby operations
-
Method Summary
Modifier and TypeMethodDescriptionaggregate
(GroupByAggregationOnColumn... aggregates) Aggregates the group of columns represented by indices Usage: aggregate(count(), max(2),...); example: input : 1, 1, 1 1, 2, 1 2, 4, 5 table.groupBy(0, 2).count() col0, col1 output: 1, 1 1, 2 2, 1 ==> aggregated countaggregateWindows
(AggregationOverWindow... windowAggregates) Computes row-based window aggregation functions on the Table/projection, based on windows specified in the argument.aggregateWindowsOverRanges
(AggregationOverWindow... windowAggregates) Computes range-based window aggregation functions on the Table/projection, based on windows specified in the argument.Splits the groups in a single table into separate tables according to the grouping keys.Similar tocontiguousSplitGroups()
, return an extra uniq key table in which each row is corresponding to a group split.replaceNulls
(ReplacePolicyWithColumn... replacements) scan
(GroupByScanAggregationOnColumn... aggregates)
-
Method Details
-
aggregate
Aggregates the group of columns represented by indices Usage: aggregate(count(), max(2),...); example: input : 1, 1, 1 1, 2, 1 2, 4, 5 table.groupBy(0, 2).count() col0, col1 output: 1, 1 1, 2 2, 1 ==> aggregated count -
aggregateWindows
Computes row-based window aggregation functions on the Table/projection, based on windows specified in the argument. This method enables queries such as the following SQL: SELECT user_id, MAX(sales_amt) OVER(PARTITION BY user_id ORDER BY date ROWS BETWEEN 1 PRECEDING and 1 FOLLOWING) FROM my_sales_table WHERE ... Each window-aggregation is represented by a differentAggregationOverWindow
argument, indicating: 1. theAggregation.Kind
, 2. the number of rows preceding and following the current row, within a window, 3. the minimum number of observations within the defined window This method returns aTable
instance, with one result column for each specified window aggregation. In this example, for the following input: [ // user_id, sales_amt { "user1", 10 }, { "user2", 20 }, { "user1", 20 }, { "user1", 10 }, { "user2", 30 }, { "user2", 80 }, { "user1", 50 }, { "user1", 60 }, { "user2", 40 } ] Partitioning (grouping) by `user_id` yields the following `sales_amt` vector (with 2 groups, one for each distinct `user_id`): [ 10, 20, 10, 50, 60, 20, 30, 80, 40 ] <-------user1-------->|<------user2-------> The SUM aggregation is applied with 1 preceding and 1 following row, with a minimum of 1 period. The aggregation window is thus 3 rows wide, yielding the following column: [ 30, 40, 80, 120, 110, 50, 130, 150, 120 ]- Parameters:
windowAggregates
- the window-aggregations to be performed- Returns:
- Table instance, with each column containing the result of each aggregation.
- Throws:
IllegalArgumentException
- if the window arguments are not of typeWindowOptions.FrameType.ROWS
, i.e. a timestamp column is specified for a window-aggregation.
-
aggregateWindowsOverRanges
Computes range-based window aggregation functions on the Table/projection, based on windows specified in the argument. This method enables queries such as the following SQL: SELECT user_id, MAX(sales_amt) OVER(PARTITION BY user_id ORDER BY date RANGE BETWEEN INTERVAL 1 DAY PRECEDING and CURRENT ROW) FROM my_sales_table WHERE ... Each window-aggregation is represented by a differentAggregationOverWindow
argument, indicating: 1. theAggregation.Kind
, 2. the index for the timestamp column to base the window definitions on 2. the number of DAYS preceding and following the current row's date, to consider in the window 3. the minimum number of observations within the defined window This method returns aTable
instance, with one result column for each specified window aggregation. In this example, for the following input: [ // user, sales_amt, YYYYMMDD (date) { "user1", 10, 20200101 }, { "user2", 20, 20200101 }, { "user1", 20, 20200102 }, { "user1", 10, 20200103 }, { "user2", 30, 20200101 }, { "user2", 80, 20200102 }, { "user1", 50, 20200107 }, { "user1", 60, 20200107 }, { "user2", 40, 20200104 } ] Partitioning (grouping) by `user_id`, and ordering by `date` yields the following `sales_amt` vector (with 2 groups, one for each distinct `user_id`): Date :(202001-) [ 01, 02, 03, 07, 07, 01, 01, 02, 04 ] Input: [ 10, 20, 10, 50, 60, 20, 30, 80, 40 ] <-------user1-------->|<---------user2---------> The SUM aggregation is applied, with 1 day preceding, and 1 day following, with a minimum of 1 period. The aggregation window is thus 3 *days* wide, yielding the following output column: Results: [ 30, 40, 30, 110, 110, 130, 130, 130, 40 ]- Parameters:
windowAggregates
- the window-aggregations to be performed- Returns:
- Table instance, with each column containing the result of each aggregation.
- Throws:
IllegalArgumentException
- if the window arguments are not of typeWindowOptions.FrameType.RANGE
or the orderBys are not of (Boolean-exclusive) integral type i.e. the timestamp-column was not specified for the aggregation.
-
scan
-
replaceNulls
-
contiguousSplitGroups
Splits the groups in a single table into separate tables according to the grouping keys. Each split table represents a single group. This API will be used by some grouping related operators to process the data group by group. Example: Grouping column index: 0 Input: A table of 3 rows (two groups) a 1 b 2 b 3 Result: Two tables, one group one table. Result[0]: a 1 Result[1]: b 2 b 3 Note, the order of the groups returned is NOT always the same with that in the input table. The split is done in native to avoid copying the offset array to JVM.- Returns:
- The tables split according to the groups in the table. NOTE: It is the responsibility of the caller to close the result. Each table and column holds a reference to the original buffer. But both the buffer and the table must be closed for the memory to be released.
-
contiguousSplitGroupsAndGenUniqKeys
Similar tocontiguousSplitGroups()
, return an extra uniq key table in which each row is corresponding to a group split. Splits the groups in a single table into separate tables according to the grouping keys. Each split table represents a single group. Example, see the example incontiguousSplitGroups()
The `uniqKeysTable` in ContigSplitGroupByResult is: a b Note: only 2 rows because of only has 2 split groups- Returns:
- The split groups and uniq key table.
-