cudf.DataFrame.apply#
- DataFrame.apply(func, axis=1, raw=False, result_type=None, args=(), **kwargs)#
Apply a function along an axis of the DataFrame.
apply
relies on Numba to JIT compilefunc
. Thus the allowed operations withinfunc
are limited to those supported by the CUDA Python Numba target. For more information, see the cuDF guide to user defined functions.- Parameters
- funcfunction
Function to apply to each row.
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Axis along which the function is applied: * 0 or ‘index’: apply function to each column.
Note: axis=0 is not yet supported.
1 or ‘columns’: apply function to each row.
- raw: bool, default False
Not yet supported
- result_type: {‘expand’, ‘reduce’, ‘broadcast’, None}, default None
Not yet supported
- args: tuple
Positional arguments to pass to func in addition to the dataframe.
Examples
Simple function of a single variable which could be NA:
>>> def f(row): ... if row['a'] is cudf.NA: ... return 0 ... else: ... return row['a'] + 1 ... >>> df = cudf.DataFrame({'a': [1, cudf.NA, 3]}) >>> df.apply(f, axis=1) 0 2 1 0 2 4 dtype: int64
Function of multiple variables will operate in a null aware manner:
>>> def f(row): ... return row['a'] - row['b'] ... >>> df = cudf.DataFrame({ ... 'a': [1, cudf.NA, 3, cudf.NA], ... 'b': [5, 6, cudf.NA, cudf.NA] ... }) >>> df.apply(f) 0 -4 1 <NA> 2 <NA> 3 <NA> dtype: int64
Functions may conditionally return NA as in pandas:
>>> def f(row): ... if row['a'] + row['b'] > 3: ... return cudf.NA ... else: ... return row['a'] + row['b'] ... >>> df = cudf.DataFrame({ ... 'a': [1, 2, 3], ... 'b': [2, 1, 1] ... }) >>> df.apply(f, axis=1) 0 3 1 3 2 <NA> dtype: int64
Mixed types are allowed, but will return the common type, rather than object as in pandas:
>>> def f(row): ... return row['a'] + row['b'] ... >>> df = cudf.DataFrame({ ... 'a': [1, 2, 3], ... 'b': [0.5, cudf.NA, 3.14] ... }) >>> df.apply(f, axis=1) 0 1.5 1 <NA> 2 6.14 dtype: float64
Functions may also return scalar values, however the result will be promoted to a safe type regardless of the data:
>>> def f(row): ... if row['a'] > 3: ... return row['a'] ... else: ... return 1.5 ... >>> df = cudf.DataFrame({ ... 'a': [1, 3, 5] ... }) >>> df.apply(f, axis=1) 0 1.5 1 1.5 2 5.0 dtype: float64
Ops against N columns are supported generally:
>>> def f(row): ... v, w, x, y, z = ( ... row['a'], row['b'], row['c'], row['d'], row['e'] ... ) ... return x + (y - (z / w)) % v ... >>> df = cudf.DataFrame({ ... 'a': [1, 2, 3], ... 'b': [4, 5, 6], ... 'c': [cudf.NA, 4, 4], ... 'd': [8, 7, 8], ... 'e': [7, 1, 6] ... }) >>> df.apply(f, axis=1) 0 <NA> 1 4.8 2 5.0 dtype: float64