Modifier and Type | Method and Description |
---|---|
Table |
aggregate(GroupByAggregationOnColumn... aggregates)
Aggregates the group of columns represented by indices
Usage:
aggregate(count(), max(2),...);
example:
input : 1, 1, 1
1, 2, 1
2, 4, 5
table.groupBy(0, 2).count()
col0, col1
output: 1, 1
1, 2
2, 1 ==> aggregated count
|
Table |
aggregateWindows(AggregationOverWindow... windowAggregates)
Computes row-based window aggregation functions on the Table/projection,
based on windows specified in the argument.
|
Table |
aggregateWindowsOverRanges(AggregationOverWindow... windowAggregates)
Computes range-based window aggregation functions on the Table/projection,
based on windows specified in the argument.
|
ContiguousTable[] |
contiguousSplitGroups()
Splits the groups in a single table into separate tables according to the grouping keys.
|
ContigSplitGroupByResult |
contiguousSplitGroupsAndGenUniqKeys()
Similar to
contiguousSplitGroups() , return an extra uniq key table in which
each row is corresponding to a group split. |
Table |
replaceNulls(ReplacePolicyWithColumn... replacements) |
Table |
scan(GroupByScanAggregationOnColumn... aggregates) |
public Table aggregate(GroupByAggregationOnColumn... aggregates)
public Table aggregateWindows(AggregationOverWindow... windowAggregates)
AggregationOverWindow
argument,
indicating:
1. the Aggregation.Kind
,
2. the number of rows preceding and following the current row, within a window,
3. the minimum number of observations within the defined window
This method returns a Table
instance, with one result column for each specified
window aggregation.
In this example, for the following input:
[ // user_id, sales_amt
{ "user1", 10 },
{ "user2", 20 },
{ "user1", 20 },
{ "user1", 10 },
{ "user2", 30 },
{ "user2", 80 },
{ "user1", 50 },
{ "user1", 60 },
{ "user2", 40 }
]
Partitioning (grouping) by `user_id` yields the following `sales_amt` vector
(with 2 groups, one for each distinct `user_id`):
[ 10, 20, 10, 50, 60, 20, 30, 80, 40 ]
<-------user1-------->|<------user2------->
The SUM aggregation is applied with 1 preceding and 1 following
row, with a minimum of 1 period. The aggregation window is thus 3 rows wide,
yielding the following column:
[ 30, 40, 80, 120, 110, 50, 130, 150, 120 ]windowAggregates
- the window-aggregations to be performedIllegalArgumentException
- if the window arguments are not of type
WindowOptions.FrameType.ROWS
,
i.e. a timestamp column is specified for a window-aggregation.public Table aggregateWindowsOverRanges(AggregationOverWindow... windowAggregates)
AggregationOverWindow
argument,
indicating:
1. the Aggregation.Kind
,
2. the index for the timestamp column to base the window definitions on
2. the number of DAYS preceding and following the current row's date, to consider in the window
3. the minimum number of observations within the defined window
This method returns a Table
instance, with one result column for each specified
window aggregation.
In this example, for the following input:
[ // user, sales_amt, YYYYMMDD (date)
{ "user1", 10, 20200101 },
{ "user2", 20, 20200101 },
{ "user1", 20, 20200102 },
{ "user1", 10, 20200103 },
{ "user2", 30, 20200101 },
{ "user2", 80, 20200102 },
{ "user1", 50, 20200107 },
{ "user1", 60, 20200107 },
{ "user2", 40, 20200104 }
]
Partitioning (grouping) by `user_id`, and ordering by `date` yields the following `sales_amt` vector
(with 2 groups, one for each distinct `user_id`):
Date :(202001-) [ 01, 02, 03, 07, 07, 01, 01, 02, 04 ]
Input: [ 10, 20, 10, 50, 60, 20, 30, 80, 40 ]
<-------user1-------->|<---------user2--------->
The SUM aggregation is applied, with 1 day preceding, and 1 day following, with a minimum of 1 period.
The aggregation window is thus 3 *days* wide, yielding the following output column:
Results: [ 30, 40, 30, 110, 110, 130, 130, 130, 40 ]windowAggregates
- the window-aggregations to be performedIllegalArgumentException
- if the window arguments are not of type
WindowOptions.FrameType.RANGE
or the orderBys are not of (Boolean-exclusive) integral type
i.e. the timestamp-column was not specified for the aggregation.public Table scan(GroupByScanAggregationOnColumn... aggregates)
public Table replaceNulls(ReplacePolicyWithColumn... replacements)
public ContiguousTable[] contiguousSplitGroups()
public ContigSplitGroupByResult contiguousSplitGroupsAndGenUniqKeys()
contiguousSplitGroups()
, return an extra uniq key table in which
each row is corresponding to a group split.
Splits the groups in a single table into separate tables according to the grouping keys.
Each split table represents a single group.
Example, see the example in contiguousSplitGroups()
The `uniqKeysTable` in ContigSplitGroupByResult is:
a
b
Note: only 2 rows because of only has 2 split groupsCopyright © 2024. All rights reserved.