Class GroupByAggregation

java.lang.Object
ai.rapids.cudf.GroupByAggregation

public final class GroupByAggregation extends Object
An aggregation that can be used for a reduce.
  • Method Details

    • onColumn

      public GroupByAggregationOnColumn onColumn(int columnIndex)
      Add a column to the Aggregation so it can be used on a specific column of data.
      Parameters:
      columnIndex - the index of the column to operate on.
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • equals

      public boolean equals(Object other)
      Overrides:
      equals in class Object
    • count

      public static GroupByAggregation count()
      Count number of valid, a.k.a. non-null, elements.
    • count

      public static GroupByAggregation count(NullPolicy nullPolicy)
      Count number of elements.
      Parameters:
      nullPolicy - INCLUDE if nulls should be counted. EXCLUDE if only non-null values should be counted.
    • sum

      public static GroupByAggregation sum()
      Sum Aggregation
    • product

      public static GroupByAggregation product()
      Product Aggregation.
    • argMax

      public static GroupByAggregation argMax()
      Index of max element. Please note that when using this aggregation if the data is not already sorted by the grouping keys it may be automatically sorted prior to doing the aggregation. This would result in an index into the sorted data being returned.
    • argMin

      public static GroupByAggregation argMin()
      Index of min element. Please note that when using this aggregation if the data is not already sorted by the grouping keys it may be automatically sorted prior to doing the aggregation. This would result in an index into the sorted data being returned.
    • min

      public static GroupByAggregation min()
      Min Aggregation
    • max

      public static GroupByAggregation max()
      Max Aggregation
    • mean

      public static GroupByAggregation mean()
      Arithmetic mean reduction.
    • M2

      public static GroupByAggregation M2()
      Sum of square of differences from mean.
    • variance

      public static GroupByAggregation variance()
      Variance aggregation with 1 as the delta degrees of freedom.
    • variance

      public static GroupByAggregation variance(int ddof)
      Variance aggregation.
      Parameters:
      ddof - delta degrees of freedom. The divisor used in calculation of variance is N - ddof, where N is the population size.
    • standardDeviation

      public static GroupByAggregation standardDeviation()
      Standard deviation aggregation with 1 as the delta degrees of freedom.
    • standardDeviation

      public static GroupByAggregation standardDeviation(int ddof)
      Standard deviation aggregation.
      Parameters:
      ddof - delta degrees of freedom. The divisor used in calculation of std is N - ddof, where N is the population size.
    • quantile

      public static GroupByAggregation quantile(double... quantiles)
      Aggregate to compute the specified quantiles. Uses linear interpolation by default.
    • quantile

      public static GroupByAggregation quantile(QuantileMethod method, double... quantiles)
      Aggregate to compute various quantiles.
    • median

      public static GroupByAggregation median()
      Median reduction.
    • nunique

      public static GroupByAggregation nunique()
      Number of unique, non-null, elements.
    • nunique

      public static GroupByAggregation nunique(NullPolicy nullPolicy)
      Number of unique elements.
      Parameters:
      nullPolicy - INCLUDE if nulls should be counted else EXCLUDE. If nulls are counted they compare as equal so multiple null values in a range would all only increase the count by 1.
    • nth

      public static GroupByAggregation nth(int offset)
      Get the nth, non-null, element in a group.
      Parameters:
      offset - the offset to look at. Negative numbers go from the end of the group. Any value outside of the group range results in a null.
    • nth

      public static GroupByAggregation nth(int offset, NullPolicy nullPolicy)
      Get the nth element in a group.
      Parameters:
      offset - the offset to look at. Negative numbers go from the end of the group. Any value outside of the group range results in a null.
      nullPolicy - INCLUDE if nulls should be included in the aggregation or EXCLUDE if they should be skipped.
    • collectList

      public static GroupByAggregation collectList()
      Collect the values into a list. Nulls will be skipped.
    • collectList

      public static GroupByAggregation collectList(NullPolicy nullPolicy)
      Collect the values into a list.
      Parameters:
      nullPolicy - Indicates whether to include/exclude nulls during collection.
    • collectSet

      public static GroupByAggregation collectSet()
      Collect the values into a set. All null values will be excluded, and all NaN values are regarded as unique instances.
    • collectSet

      public static GroupByAggregation collectSet(NullPolicy nullPolicy, NullEquality nullEquality, NaNEquality nanEquality)
      Collect the values into a set.
      Parameters:
      nullPolicy - Indicates whether to include/exclude nulls during collection.
      nullEquality - Flag to specify whether null entries within each list should be considered equal.
      nanEquality - Flag to specify whether NaN values in floating point column should be considered equal.
    • mergeLists

      public static GroupByAggregation mergeLists()
      Merge the partial lists produced by multiple CollectListAggregations. NOTICE: The partial lists to be merged should NOT include any null list element (but can include null list entries).
    • mergeSets

      public static GroupByAggregation mergeSets()
      Merge the partial sets produced by multiple CollectSetAggregations. Each null/NaN value will be regarded as a unique instance.
    • hostUDF

      public static GroupByAggregation hostUDF(HostUDFWrapper wrapper)
      Execute an aggregation using a host-side user-defined function (UDF).
      Parameters:
      wrapper - The wrapper for the native host UDF instance.
      Returns:
      A new GroupByAggregation instance
    • mergeSets

      public static GroupByAggregation mergeSets(NullEquality nullEquality, NaNEquality nanEquality)
      Merge the partial sets produced by multiple CollectSetAggregations.
      Parameters:
      nullEquality - Flag to specify whether null entries within each list should be considered equal.
      nanEquality - Flag to specify whether NaN values in floating point column should be considered equal.
    • mergeM2

      public static GroupByAggregation mergeM2()
      Merge the partial M2 values produced by multiple instances of M2Aggregation.
    • createTDigest

      public static GroupByAggregation createTDigest(int delta)
      Compute a t-digest from on a fixed-width numeric input column.
      Parameters:
      delta - Required accuracy (number of buckets).
      Returns:
      A list of centroids per grouping, where each centroid has a mean value and a weight. The number of centroids will be <= delta.
    • mergeTDigest

      public static GroupByAggregation mergeTDigest(int delta)
      Merge t-digests.
      Parameters:
      delta - Required accuracy (number of buckets).
      Returns:
      A list of centroids per grouping, where each centroid has a mean value and a weight. The number of centroids will be <= delta.
    • histogram

      public static GroupByAggregation histogram()
      Histogram aggregation, computing the frequencies for each unique row. A histogram is given as a lists column, in which the first child stores unique rows from the input values and the second child stores their corresponding frequencies.
      Returns:
      A lists of structs column in which each list contains a histogram corresponding to an input key.
    • mergeHistogram

      public static GroupByAggregation mergeHistogram()
      MergeHistogram aggregation, to merge multiple histograms.
      Returns:
      A new histogram in which the frequencies of the unique rows are sum up.
    • bitAnd

      public static GroupByAggregation bitAnd()
      Bitwise AND aggregation, computing the bitwise AND of all non-null values in a group.
    • bitOr

      public static GroupByAggregation bitOr()
      Bitwise OR aggregation, computing the bitwise OR of all non-null values in a group.
    • bitXor

      public static GroupByAggregation bitXor()
      Bitwise XOR aggregation, computing the bitwise XOR of all non-null values in a group.