Package ai.rapids.cudf
Class GroupByAggregation
java.lang.Object
ai.rapids.cudf.GroupByAggregation
An aggregation that can be used for a reduce.
-
Method Summary
Modifier and TypeMethodDescriptionstatic GroupByAggregation
argMax()
Index of max element.static GroupByAggregation
argMin()
Index of min element.static GroupByAggregation
bitAnd()
Bitwise AND aggregation, computing the bitwise AND of all non-null values in a group.static GroupByAggregation
bitOr()
Bitwise OR aggregation, computing the bitwise OR of all non-null values in a group.static GroupByAggregation
bitXor()
Bitwise XOR aggregation, computing the bitwise XOR of all non-null values in a group.static GroupByAggregation
Collect the values into a list.static GroupByAggregation
collectList
(NullPolicy nullPolicy) Collect the values into a list.static GroupByAggregation
Collect the values into a set.static GroupByAggregation
collectSet
(NullPolicy nullPolicy, NullEquality nullEquality, NaNEquality nanEquality) Collect the values into a set.static GroupByAggregation
count()
Count number of valid, a.k.a.static GroupByAggregation
count
(NullPolicy nullPolicy) Count number of elements.static GroupByAggregation
createTDigest
(int delta) Compute a t-digest from on a fixed-width numeric input column.boolean
int
hashCode()
static GroupByAggregation
Histogram aggregation, computing the frequencies for each unique row.static GroupByAggregation
hostUDF
(HostUDFWrapper wrapper) Execute an aggregation using a host-side user-defined function (UDF).static GroupByAggregation
M2()
Sum of square of differences from mean.static GroupByAggregation
max()
Max Aggregationstatic GroupByAggregation
mean()
Arithmetic mean reduction.static GroupByAggregation
median()
Median reduction.static GroupByAggregation
MergeHistogram aggregation, to merge multiple histograms.static GroupByAggregation
Merge the partial lists produced by multiple CollectListAggregations.static GroupByAggregation
mergeM2()
Merge the partial M2 values produced by multiple instances of M2Aggregation.static GroupByAggregation
Merge the partial sets produced by multiple CollectSetAggregations.static GroupByAggregation
mergeSets
(NullEquality nullEquality, NaNEquality nanEquality) Merge the partial sets produced by multiple CollectSetAggregations.static GroupByAggregation
mergeTDigest
(int delta) Merge t-digests.static GroupByAggregation
min()
Min Aggregationstatic GroupByAggregation
nth
(int offset) Get the nth, non-null, element in a group.static GroupByAggregation
nth
(int offset, NullPolicy nullPolicy) Get the nth element in a group.static GroupByAggregation
nunique()
Number of unique, non-null, elements.static GroupByAggregation
nunique
(NullPolicy nullPolicy) Number of unique elements.onColumn
(int columnIndex) Add a column to the Aggregation so it can be used on a specific column of data.static GroupByAggregation
product()
Product Aggregation.static GroupByAggregation
quantile
(double... quantiles) Aggregate to compute the specified quantiles.static GroupByAggregation
quantile
(QuantileMethod method, double... quantiles) Aggregate to compute various quantiles.static GroupByAggregation
Standard deviation aggregation with 1 as the delta degrees of freedom.static GroupByAggregation
standardDeviation
(int ddof) Standard deviation aggregation.static GroupByAggregation
sum()
Sum Aggregationstatic GroupByAggregation
variance()
Variance aggregation with 1 as the delta degrees of freedom.static GroupByAggregation
variance
(int ddof) Variance aggregation.
-
Method Details
-
onColumn
Add a column to the Aggregation so it can be used on a specific column of data.- Parameters:
columnIndex
- the index of the column to operate on.
-
hashCode
public int hashCode() -
equals
-
count
Count number of valid, a.k.a. non-null, elements. -
count
Count number of elements.- Parameters:
nullPolicy
- INCLUDE if nulls should be counted. EXCLUDE if only non-null values should be counted.
-
sum
Sum Aggregation -
product
Product Aggregation. -
argMax
Index of max element. Please note that when using this aggregation if the data is not already sorted by the grouping keys it may be automatically sorted prior to doing the aggregation. This would result in an index into the sorted data being returned. -
argMin
Index of min element. Please note that when using this aggregation if the data is not already sorted by the grouping keys it may be automatically sorted prior to doing the aggregation. This would result in an index into the sorted data being returned. -
min
Min Aggregation -
max
Max Aggregation -
mean
Arithmetic mean reduction. -
M2
Sum of square of differences from mean. -
variance
Variance aggregation with 1 as the delta degrees of freedom. -
variance
Variance aggregation.- Parameters:
ddof
- delta degrees of freedom. The divisor used in calculation of variance isN - ddof
, where N is the population size.
-
standardDeviation
Standard deviation aggregation with 1 as the delta degrees of freedom. -
standardDeviation
Standard deviation aggregation.- Parameters:
ddof
- delta degrees of freedom. The divisor used in calculation of std isN - ddof
, where N is the population size.
-
quantile
Aggregate to compute the specified quantiles. Uses linear interpolation by default. -
quantile
Aggregate to compute various quantiles. -
median
Median reduction. -
nunique
Number of unique, non-null, elements. -
nunique
Number of unique elements.- Parameters:
nullPolicy
- INCLUDE if nulls should be counted else EXCLUDE. If nulls are counted they compare as equal so multiple null values in a range would all only increase the count by 1.
-
nth
Get the nth, non-null, element in a group.- Parameters:
offset
- the offset to look at. Negative numbers go from the end of the group. Any value outside of the group range results in a null.
-
nth
Get the nth element in a group.- Parameters:
offset
- the offset to look at. Negative numbers go from the end of the group. Any value outside of the group range results in a null.nullPolicy
- INCLUDE if nulls should be included in the aggregation or EXCLUDE if they should be skipped.
-
collectList
Collect the values into a list. Nulls will be skipped. -
collectList
Collect the values into a list.- Parameters:
nullPolicy
- Indicates whether to include/exclude nulls during collection.
-
collectSet
Collect the values into a set. All null values will be excluded, and all NaN values are regarded as unique instances. -
collectSet
public static GroupByAggregation collectSet(NullPolicy nullPolicy, NullEquality nullEquality, NaNEquality nanEquality) Collect the values into a set.- Parameters:
nullPolicy
- Indicates whether to include/exclude nulls during collection.nullEquality
- Flag to specify whether null entries within each list should be considered equal.nanEquality
- Flag to specify whether NaN values in floating point column should be considered equal.
-
mergeLists
Merge the partial lists produced by multiple CollectListAggregations. NOTICE: The partial lists to be merged should NOT include any null list element (but can include null list entries). -
mergeSets
Merge the partial sets produced by multiple CollectSetAggregations. Each null/NaN value will be regarded as a unique instance. -
hostUDF
Execute an aggregation using a host-side user-defined function (UDF).- Parameters:
wrapper
- The wrapper for the native host UDF instance.- Returns:
- A new GroupByAggregation instance
-
mergeSets
Merge the partial sets produced by multiple CollectSetAggregations.- Parameters:
nullEquality
- Flag to specify whether null entries within each list should be considered equal.nanEquality
- Flag to specify whether NaN values in floating point column should be considered equal.
-
mergeM2
Merge the partial M2 values produced by multiple instances of M2Aggregation. -
createTDigest
Compute a t-digest from on a fixed-width numeric input column.- Parameters:
delta
- Required accuracy (number of buckets).- Returns:
- A list of centroids per grouping, where each centroid has a mean value and a weight. The number of centroids will be <= delta.
-
mergeTDigest
Merge t-digests.- Parameters:
delta
- Required accuracy (number of buckets).- Returns:
- A list of centroids per grouping, where each centroid has a mean value and a weight. The number of centroids will be <= delta.
-
histogram
Histogram aggregation, computing the frequencies for each unique row. A histogram is given as a lists column, in which the first child stores unique rows from the input values and the second child stores their corresponding frequencies.- Returns:
- A lists of structs column in which each list contains a histogram corresponding to an input key.
-
mergeHistogram
MergeHistogram aggregation, to merge multiple histograms.- Returns:
- A new histogram in which the frequencies of the unique rows are sum up.
-
bitAnd
Bitwise AND aggregation, computing the bitwise AND of all non-null values in a group. -
bitOr
Bitwise OR aggregation, computing the bitwise OR of all non-null values in a group. -
bitXor
Bitwise XOR aggregation, computing the bitwise XOR of all non-null values in a group.
-