cudf.DataFrame.apply#

DataFrame.apply(func, axis=1, raw=False, result_type=None, args=(), **kwargs)#

Apply a function along an axis of the DataFrame.

Designed to mimic pandas.DataFrame.apply. Applies a user defined function row wise over a dataframe, with true null handling. Works with UDFs using core.udf.pipeline.nulludf and returns a single series. Uses numba to jit compile the function to PTX via LLVM.

Parameters
funcfunction

Function to apply to each row.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Axis along which the function is applied: * 0 or ‘index’: apply function to each column.

Note: axis=0 is not yet supported.

  • 1 or ‘columns’: apply function to each row.

raw: bool, default False

Not yet supported

result_type: {‘expand’, ‘reduce’, ‘broadcast’, None}, default None

Not yet supported

args: tuple

Positional arguments to pass to func in addition to the dataframe.

Examples

Simple function of a single variable which could be NA

>>> def f(row):
...     if row['a'] is cudf.NA:
...             return 0
...     else:
...             return row['a'] + 1
...
>>> df = cudf.DataFrame({'a': [1, cudf.NA, 3]})
>>> df.apply(f, axis=1)
0    2
1    0
2    4
dtype: int64

Function of multiple variables will operate in a null aware manner

>>> def f(row):
...     return row['a'] - row['b']
...
>>> df = cudf.DataFrame({
...     'a': [1, cudf.NA, 3, cudf.NA],
...     'b': [5, 6, cudf.NA, cudf.NA]
... })
>>> df.apply(f)
0      -4
1    <NA>
2    <NA>
3    <NA>
dtype: int64

Functions may conditionally return NA as in pandas

>>> def f(row):
...     if row['a'] + row['b'] > 3:
...             return cudf.NA
...     else:
...             return row['a'] + row['b']
...
>>> df = cudf.DataFrame({
...     'a': [1, 2, 3],
...     'b': [2, 1, 1]
... })
>>> df.apply(f, axis=1)
0       3
1       3
2    <NA>
dtype: int64

Mixed types are allowed, but will return the common type, rather than object as in pandas

>>> def f(row):
...     return row['a'] + row['b']
...
>>> df = cudf.DataFrame({
...     'a': [1, 2, 3],
...     'b': [0.5, cudf.NA, 3.14]
... })
>>> df.apply(f, axis=1)
0     1.5
1    <NA>
2    6.14
dtype: float64

Functions may also return scalar values, however the result will be promoted to a safe type regardless of the data

>>> def f(row):
...     if row['a'] > 3:
...             return row['a']
...     else:
...             return 1.5
...
>>> df = cudf.DataFrame({
...     'a': [1, 3, 5]
... })
>>> df.apply(f, axis=1)
0    1.5
1    1.5
2    5.0
dtype: float64

Ops against N columns are supported generally

>>> def f(row):
...     v, w, x, y, z = (
...         row['a'], row['b'], row['c'], row['d'], row['e']
...     )
...     return x + (y - (z / w)) % v
...
>>> df = cudf.DataFrame({
...     'a': [1, 2, 3],
...     'b': [4, 5, 6],
...     'c': [cudf.NA, 4, 4],
...     'd': [8, 7, 8],
...     'e': [7, 1, 6]
... })
>>> df.apply(f, axis=1)
0    <NA>
1     4.8
2     5.0
dtype: float64