Table.GroupByOperation (cudfjni 24.12.0 API)

java.lang.Object
- ai.rapids.cudf.Table.GroupByOperation

Enclosing class:: Table

public static final class Table.GroupByOperation
extends Object

Class representing groupby operations

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`Table`	`aggregate(GroupByAggregationOnColumn... aggregates)` Aggregates the group of columns represented by indices Usage: aggregate(count(), max(2),...); example: input : 1, 1, 1 1, 2, 1 2, 4, 5 table.groupBy(0, 2).count() col0, col1 output: 1, 1 1, 2 2, 1 ==> aggregated count
`Table`	`aggregateWindows(AggregationOverWindow... windowAggregates)` Computes row-based window aggregation functions on the Table/projection, based on windows specified in the argument.
`Table`	`aggregateWindowsOverRanges(AggregationOverWindow... windowAggregates)` Computes range-based window aggregation functions on the Table/projection, based on windows specified in the argument.
`ContiguousTable[]`	`contiguousSplitGroups()` Splits the groups in a single table into separate tables according to the grouping keys.
`ContigSplitGroupByResult`	`contiguousSplitGroupsAndGenUniqKeys()` Similar to `contiguousSplitGroups()`, return an extra uniq key table in which each row is corresponding to a group split.
`Table`	`replaceNulls(ReplacePolicyWithColumn... replacements)`
`Table`	`scan(GroupByScanAggregationOnColumn... aggregates)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Method Detail
  - aggregate
```
public Table aggregate(GroupByAggregationOnColumn... aggregates)
```
    Aggregates the group of columns represented by indices Usage: aggregate(count(), max(2),...); example: input : 1, 1, 1 1, 2, 1 2, 4, 5 table.groupBy(0, 2).count() col0, col1 output: 1, 1 1, 2 2, 1 ==> aggregated count
  - aggregateWindows
```
public Table aggregateWindows(AggregationOverWindow... windowAggregates)
```
    Computes row-based window aggregation functions on the Table/projection, based on windows specified in the argument. This method enables queries such as the following SQL: SELECT user_id, MAX(sales_amt) OVER(PARTITION BY user_id ORDER BY date ROWS BETWEEN 1 PRECEDING and 1 FOLLOWING) FROM my_sales_table WHERE ... Each window-aggregation is represented by a different AggregationOverWindow argument, indicating: 1. the Aggregation.Kind, 2. the number of rows preceding and following the current row, within a window, 3. the minimum number of observations within the defined window This method returns a Table instance, with one result column for each specified window aggregation. In this example, for the following input: [ // user_id, sales_amt { "user1", 10 }, { "user2", 20 }, { "user1", 20 }, { "user1", 10 }, { "user2", 30 }, { "user2", 80 }, { "user1", 50 }, { "user1", 60 }, { "user2", 40 } ] Partitioning (grouping) by `user_id` yields the following `sales_amt` vector (with 2 groups, one for each distinct `user_id`): [ 10, 20, 10, 50, 60, 20, 30, 80, 40 ] <-------user1-------->|<------user2-------> The SUM aggregation is applied with 1 preceding and 1 following row, with a minimum of 1 period. The aggregation window is thus 3 rows wide, yielding the following column: [ 30, 40, 80, 120, 110, 50, 130, 150, 120 ]
    
    Parameters:
    
    windowAggregates - the window-aggregations to be performed
    
    Returns:
    
    Table instance, with each column containing the result of each aggregation.
    
    Throws:
    
    IllegalArgumentException - if the window arguments are not of type WindowOptions.FrameType.ROWS, i.e. a timestamp column is specified for a window-aggregation.
  - aggregateWindowsOverRanges
```
public Table aggregateWindowsOverRanges(AggregationOverWindow... windowAggregates)
```
    Computes range-based window aggregation functions on the Table/projection, based on windows specified in the argument. This method enables queries such as the following SQL: SELECT user_id, MAX(sales_amt) OVER(PARTITION BY user_id ORDER BY date RANGE BETWEEN INTERVAL 1 DAY PRECEDING and CURRENT ROW) FROM my_sales_table WHERE ... Each window-aggregation is represented by a different AggregationOverWindow argument, indicating: 1. the Aggregation.Kind, 2. the index for the timestamp column to base the window definitions on 2. the number of DAYS preceding and following the current row's date, to consider in the window 3. the minimum number of observations within the defined window This method returns a Table instance, with one result column for each specified window aggregation. In this example, for the following input: [ // user, sales_amt, YYYYMMDD (date) { "user1", 10, 20200101 }, { "user2", 20, 20200101 }, { "user1", 20, 20200102 }, { "user1", 10, 20200103 }, { "user2", 30, 20200101 }, { "user2", 80, 20200102 }, { "user1", 50, 20200107 }, { "user1", 60, 20200107 }, { "user2", 40, 20200104 } ] Partitioning (grouping) by `user_id`, and ordering by `date` yields the following `sales_amt` vector (with 2 groups, one for each distinct `user_id`): Date :(202001-) [ 01, 02, 03, 07, 07, 01, 01, 02, 04 ] Input: [ 10, 20, 10, 50, 60, 20, 30, 80, 40 ] <-------user1-------->|<---------user2---------> The SUM aggregation is applied, with 1 day preceding, and 1 day following, with a minimum of 1 period. The aggregation window is thus 3 *days* wide, yielding the following output column: Results: [ 30, 40, 30, 110, 110, 130, 130, 130, 40 ]
    
    Parameters:
    
    windowAggregates - the window-aggregations to be performed
    
    Returns:
    
    Table instance, with each column containing the result of each aggregation.
    
    Throws:
    
    IllegalArgumentException - if the window arguments are not of type WindowOptions.FrameType.RANGE or the orderBys are not of (Boolean-exclusive) integral type i.e. the timestamp-column was not specified for the aggregation.
  - scan
```
public Table scan(GroupByScanAggregationOnColumn... aggregates)
```
  - replaceNulls
```
public Table replaceNulls(ReplacePolicyWithColumn... replacements)
```
  - contiguousSplitGroups
```
public ContiguousTable[] contiguousSplitGroups()
```
    Splits the groups in a single table into separate tables according to the grouping keys. Each split table represents a single group. This API will be used by some grouping related operators to process the data group by group. Example: Grouping column index: 0 Input: A table of 3 rows (two groups) a 1 b 2 b 3 Result: Two tables, one group one table. Result[0]: a 1 Result[1]: b 2 b 3 Note, the order of the groups returned is NOT always the same with that in the input table. The split is done in native to avoid copying the offset array to JVM.
    
    Returns:
    
    The tables split according to the groups in the table. NOTE: It is the responsibility of the caller to close the result. Each table and column holds a reference to the original buffer. But both the buffer and the table must be closed for the memory to be released.
  - contiguousSplitGroupsAndGenUniqKeys
```
public ContigSplitGroupByResult contiguousSplitGroupsAndGenUniqKeys()
```
    Similar to contiguousSplitGroups(), return an extra uniq key table in which each row is corresponding to a group split. Splits the groups in a single table into separate tables according to the grouping keys. Each split table represents a single group. Example, see the example in contiguousSplitGroups() The `uniqKeysTable` in ContigSplitGroupByResult is: a b Note: only 2 rows because of only has 2 split groups
    
    Returns:
    
    The split groups and uniq key table.

Class Table.GroupByOperation

Method Summary

Methods inherited from class java.lang.Object

Method Detail

aggregate

aggregateWindows

aggregateWindowsOverRanges

scan

replaceNulls

contiguousSplitGroups

contiguousSplitGroupsAndGenUniqKeys