cudf.DataFrame.apply#

DataFrame.apply(func, axis=1, raw=False, result_type=None, args=(), **kwargs)#

Apply a function along an axis of the DataFrame. apply relies on Numba to JIT compile func. Thus the allowed operations within func are limited to those supported by the CUDA Python Numba target. For more information, see the cuDF guide to user defined functions.

Parameters
funcfunction

Function to apply to each row.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Axis along which the function is applied: * 0 or ‘index’: apply function to each column.

Note: axis=0 is not yet supported.

  • 1 or ‘columns’: apply function to each row.

raw: bool, default False

Not yet supported

result_type: {‘expand’, ‘reduce’, ‘broadcast’, None}, default None

Not yet supported

args: tuple

Positional arguments to pass to func in addition to the dataframe.

Examples

Simple function of a single variable which could be NA:

>>> def f(row):
...     if row['a'] is cudf.NA:
...             return 0
...     else:
...             return row['a'] + 1
...
>>> df = cudf.DataFrame({'a': [1, cudf.NA, 3]})
>>> df.apply(f, axis=1)
0    2
1    0
2    4
dtype: int64

Function of multiple variables will operate in a null aware manner:

>>> def f(row):
...     return row['a'] - row['b']
...
>>> df = cudf.DataFrame({
...     'a': [1, cudf.NA, 3, cudf.NA],
...     'b': [5, 6, cudf.NA, cudf.NA]
... })
>>> df.apply(f)
0      -4
1    <NA>
2    <NA>
3    <NA>
dtype: int64

Functions may conditionally return NA as in pandas:

>>> def f(row):
...     if row['a'] + row['b'] > 3:
...             return cudf.NA
...     else:
...             return row['a'] + row['b']
...
>>> df = cudf.DataFrame({
...     'a': [1, 2, 3],
...     'b': [2, 1, 1]
... })
>>> df.apply(f, axis=1)
0       3
1       3
2    <NA>
dtype: int64

Mixed types are allowed, but will return the common type, rather than object as in pandas:

>>> def f(row):
...     return row['a'] + row['b']
...
>>> df = cudf.DataFrame({
...     'a': [1, 2, 3],
...     'b': [0.5, cudf.NA, 3.14]
... })
>>> df.apply(f, axis=1)
0     1.5
1    <NA>
2    6.14
dtype: float64

Functions may also return scalar values, however the result will be promoted to a safe type regardless of the data:

>>> def f(row):
...     if row['a'] > 3:
...             return row['a']
...     else:
...             return 1.5
...
>>> df = cudf.DataFrame({
...     'a': [1, 3, 5]
... })
>>> df.apply(f, axis=1)
0    1.5
1    1.5
2    5.0
dtype: float64

Ops against N columns are supported generally:

>>> def f(row):
...     v, w, x, y, z = (
...         row['a'], row['b'], row['c'], row['d'], row['e']
...     )
...     return x + (y - (z / w)) % v
...
>>> df = cudf.DataFrame({
...     'a': [1, 2, 3],
...     'b': [4, 5, 6],
...     'c': [cudf.NA, 4, 4],
...     'd': [8, 7, 8],
...     'e': [7, 1, 6]
... })
>>> df.apply(f, axis=1)
0    <NA>
1     4.8
2     5.0
dtype: float64