cudf.core.groupby.groupby.DataFrameGroupBy.aggregate#

DataFrameGroupBy.aggregate(func)[source]#

Apply aggregation(s) to the groups.

Parameters:
funcstr, callable, list or dict

Argument specifying the aggregation(s) to perform on the groups. func can be any of the following:

  • string: the name of a supported aggregation

  • callable: a function that accepts a Series/DataFrame and performs a supported operation on it.

  • list: a list of strings/callables specifying the aggregations to perform on every column.

  • dict: a mapping of column names to string/callable specifying the aggregations to perform on those columns.

See :ref:`the user guide <basics.groupby>` for supported
aggregations.
Returns:
A Series or DataFrame containing the combined results of the
aggregation(s).

Examples

>>> import cudf
>>> a = cudf.DataFrame({
...     'a': [1, 1, 2],
...     'b': [1, 2, 3],
...     'c': [2, 2, 1]
... })
>>> a.groupby('a', sort=True).agg('sum')
   b  c
a
1  3  4
2  3  1

Specifying a list of aggregations to perform on each column.

>>> import cudf
>>> a = cudf.DataFrame({
...     'a': [1, 1, 2],
...     'b': [1, 2, 3],
...     'c': [2, 2, 1]
... })
>>> a.groupby('a', sort=True).agg(['sum', 'min'])
    b       c
  sum min sum min
a
1   3   1   4   2
2   3   3   1   1

Using a dict to specify aggregations to perform per column.

>>> import cudf
>>> a = cudf.DataFrame({
...     'a': [1, 1, 2],
...     'b': [1, 2, 3],
...     'c': [2, 2, 1]
... })
>>> a.groupby('a', sort=True).agg({'a': 'max', 'b': ['min', 'mean']})
    a   b
  max min mean
a
1   1   1  1.5
2   2   3  3.0

Using lambdas/callables to specify aggregations taking parameters.

>>> import cudf
>>> a = cudf.DataFrame({
...     'a': [1, 1, 2],
...     'b': [1, 2, 3],
...     'c': [2, 2, 1]
... })
>>> f1 = lambda x: x.quantile(0.5); f1.__name__ = "q0.5"
>>> f2 = lambda x: x.quantile(0.75); f2.__name__ = "q0.75"
>>> a.groupby('a').agg([f1, f2])
     b          c
  q0.5 q0.75 q0.5 q0.75
a
1  1.5  1.75  2.0   2.0
2  3.0  3.00  1.0   1.0