Class Table.GroupByOperation

java.lang.Object
ai.rapids.cudf.Table.GroupByOperation
Enclosing class:
Table

public static final class Table.GroupByOperation extends Object
Class representing groupby operations
  • Method Details

    • aggregate

      public Table aggregate(GroupByAggregationOnColumn... aggregates)
      Aggregates the group of columns represented by indices Usage: aggregate(count(), max(2),...); example: input : 1, 1, 1 1, 2, 1 2, 4, 5 table.groupBy(0, 2).count() col0, col1 output: 1, 1 1, 2 2, 1 ==> aggregated count
    • aggregateWindows

      public Table aggregateWindows(AggregationOverWindow... windowAggregates)
      Computes row-based window aggregation functions on the Table/projection, based on windows specified in the argument. This method enables queries such as the following SQL: SELECT user_id, MAX(sales_amt) OVER(PARTITION BY user_id ORDER BY date ROWS BETWEEN 1 PRECEDING and 1 FOLLOWING) FROM my_sales_table WHERE ... Each window-aggregation is represented by a different AggregationOverWindow argument, indicating: 1. the Aggregation.Kind, 2. the number of rows preceding and following the current row, within a window, 3. the minimum number of observations within the defined window This method returns a Table instance, with one result column for each specified window aggregation. In this example, for the following input: [ // user_id, sales_amt { "user1", 10 }, { "user2", 20 }, { "user1", 20 }, { "user1", 10 }, { "user2", 30 }, { "user2", 80 }, { "user1", 50 }, { "user1", 60 }, { "user2", 40 } ] Partitioning (grouping) by `user_id` yields the following `sales_amt` vector (with 2 groups, one for each distinct `user_id`): [ 10, 20, 10, 50, 60, 20, 30, 80, 40 ] <-------user1-------->|<------user2-------> The SUM aggregation is applied with 1 preceding and 1 following row, with a minimum of 1 period. The aggregation window is thus 3 rows wide, yielding the following column: [ 30, 40, 80, 120, 110, 50, 130, 150, 120 ]
      Parameters:
      windowAggregates - the window-aggregations to be performed
      Returns:
      Table instance, with each column containing the result of each aggregation.
      Throws:
      IllegalArgumentException - if the window arguments are not of type WindowOptions.FrameType.ROWS, i.e. a timestamp column is specified for a window-aggregation.
    • aggregateWindowsOverRanges

      public Table aggregateWindowsOverRanges(AggregationOverWindow... windowAggregates)
      Computes range-based window aggregation functions on the Table/projection, based on windows specified in the argument. This method enables queries such as the following SQL: SELECT user_id, MAX(sales_amt) OVER(PARTITION BY user_id ORDER BY date RANGE BETWEEN INTERVAL 1 DAY PRECEDING and CURRENT ROW) FROM my_sales_table WHERE ... Each window-aggregation is represented by a different AggregationOverWindow argument, indicating: 1. the Aggregation.Kind, 2. the index for the timestamp column to base the window definitions on 2. the number of DAYS preceding and following the current row's date, to consider in the window 3. the minimum number of observations within the defined window This method returns a Table instance, with one result column for each specified window aggregation. In this example, for the following input: [ // user, sales_amt, YYYYMMDD (date) { "user1", 10, 20200101 }, { "user2", 20, 20200101 }, { "user1", 20, 20200102 }, { "user1", 10, 20200103 }, { "user2", 30, 20200101 }, { "user2", 80, 20200102 }, { "user1", 50, 20200107 }, { "user1", 60, 20200107 }, { "user2", 40, 20200104 } ] Partitioning (grouping) by `user_id`, and ordering by `date` yields the following `sales_amt` vector (with 2 groups, one for each distinct `user_id`): Date :(202001-) [ 01, 02, 03, 07, 07, 01, 01, 02, 04 ] Input: [ 10, 20, 10, 50, 60, 20, 30, 80, 40 ] <-------user1-------->|<---------user2---------> The SUM aggregation is applied, with 1 day preceding, and 1 day following, with a minimum of 1 period. The aggregation window is thus 3 *days* wide, yielding the following output column: Results: [ 30, 40, 30, 110, 110, 130, 130, 130, 40 ]
      Parameters:
      windowAggregates - the window-aggregations to be performed
      Returns:
      Table instance, with each column containing the result of each aggregation.
      Throws:
      IllegalArgumentException - if the window arguments are not of type WindowOptions.FrameType.RANGE or the orderBys are not of (Boolean-exclusive) integral type i.e. the timestamp-column was not specified for the aggregation.
    • scan

      public Table scan(GroupByScanAggregationOnColumn... aggregates)
    • replaceNulls

      public Table replaceNulls(ReplacePolicyWithColumn... replacements)
    • contiguousSplitGroups

      public ContiguousTable[] contiguousSplitGroups()
      Splits the groups in a single table into separate tables according to the grouping keys. Each split table represents a single group. This API will be used by some grouping related operators to process the data group by group. Example: Grouping column index: 0 Input: A table of 3 rows (two groups) a 1 b 2 b 3 Result: Two tables, one group one table. Result[0]: a 1 Result[1]: b 2 b 3 Note, the order of the groups returned is NOT always the same with that in the input table. The split is done in native to avoid copying the offset array to JVM.
      Returns:
      The tables split according to the groups in the table. NOTE: It is the responsibility of the caller to close the result. Each table and column holds a reference to the original buffer. But both the buffer and the table must be closed for the memory to be released.
    • contiguousSplitGroupsAndGenUniqKeys

      public ContigSplitGroupByResult contiguousSplitGroupsAndGenUniqKeys()
      Similar to contiguousSplitGroups(), return an extra uniq key table in which each row is corresponding to a group split. Splits the groups in a single table into separate tables according to the grouping keys. Each split table represents a single group. Example, see the example in contiguousSplitGroups() The `uniqKeysTable` in ContigSplitGroupByResult is: a b Note: only 2 rows because of only has 2 split groups
      Returns:
      The split groups and uniq key table.