cudf.core.groupby.groupby.GroupBy.apply#

GroupBy.apply(function, *args)#

Apply a python transformation function over the grouped chunk.

Parameters
funcfunction

The python transformation function that will be applied on the grouped chunk.

Examples

from cudf import DataFrame
df = DataFrame()
df['key'] = [0, 0, 1, 1, 2, 2, 2]
df['val'] = [0, 1, 2, 3, 4, 5, 6]
groups = df.groupby(['key'])

# Define a function to apply to each row in a group
def mult(df):
  df['out'] = df['key'] * df['val']
  return df

result = groups.apply(mult)
print(result)

Output:

   key  val  out
0    0    0    0
1    0    1    0
2    1    2    2
3    1    3    3
4    2    4    8
5    2    5   10
6    2    6   12

Pandas Compatibility Note

groupby.apply

cuDF’s groupby.apply is limited compared to pandas. In some situations, Pandas returns the grouped keys as part of the index while cudf does not due to redundancy. For example:

>>> df = pd.DataFrame({
...     'a': [1, 1, 2, 2],
...     'b': [1, 2, 1, 2],
...     'c': [1, 2, 3, 4],
... })
>>> gdf = cudf.from_pandas(df)
>>> df.groupby('a').apply(lambda x: x.iloc[[0]])
     a  b  c
a
1 0  1  1  1
2 2  2  1  3
>>> gdf.groupby('a').apply(lambda x: x.iloc[[0]])
   a  b  c
0  1  1  1
2  2  1  3