cudf.DataFrame.apply#

DataFrame.apply(func, axis=1, raw=False, result_type=None, args=(), **kwargs)#

Apply a function along an axis of the DataFrame. apply relies on Numba to JIT compile func. Thus the allowed operations within func are limited to those supported by the CUDA Python Numba target. For more information, see the cuDF guide to user defined functions.

Some string functions and methods are supported. Refer to the guide to UDFs for details.

Parameters:

funcfunction: Function to apply to each row.
axis{0 or ‘index’, 1 or ‘columns’}, default 0: Axis along which the function is applied. - 0 or ‘index’: apply function to each column (not yet supported). - 1 or ‘columns’: apply function to each row.
raw: bool, default False: Not yet supported
result_type: {‘expand’, ‘reduce’, ‘broadcast’, None}, default None: Not yet supported
args: tuple: Positional arguments to pass to func in addition to the dataframe.

Examples

Simple function of a single variable which could be NA:

>>> def f(row):
...     if row['a'] is cudf.NA:
...             return 0
...     else:
...             return row['a'] + 1
...
>>> df = cudf.DataFrame({'a': [1, cudf.NA, 3]})
>>> df.apply(f, axis=1)
0    2
1    0
2    4
dtype: int64

Function of multiple variables will operate in a null aware manner:

>>> def f(row):
...     return row['a'] - row['b']
...
>>> df = cudf.DataFrame({
...     'a': [1, cudf.NA, 3, cudf.NA],
...     'b': [5, 6, cudf.NA, cudf.NA]
... })
>>> df.apply(f)
0      -4
1    <NA>
2    <NA>
3    <NA>
dtype: int64

Functions may conditionally return NA as in pandas:

>>> def f(row):
...     if row['a'] + row['b'] > 3:
...             return cudf.NA
...     else:
...             return row['a'] + row['b']
...
>>> df = cudf.DataFrame({
...     'a': [1, 2, 3],
...     'b': [2, 1, 1]
... })
>>> df.apply(f, axis=1)
0       3
1       3
2    <NA>
dtype: int64

Mixed types are allowed, but will return the common type, rather than object as in pandas:

>>> def f(row):
...     return row['a'] + row['b']
...
>>> df = cudf.DataFrame({
...     'a': [1, 2, 3],
...     'b': [0.5, cudf.NA, 3.14]
... })
>>> df.apply(f, axis=1)
0     1.5
1    <NA>
2    6.14
dtype: float64

Functions may also return scalar values, however the result will be promoted to a safe type regardless of the data:

>>> def f(row):
...     if row['a'] > 3:
...             return row['a']
...     else:
...             return 1.5
...
>>> df = cudf.DataFrame({
...     'a': [1, 3, 5]
... })
>>> df.apply(f, axis=1)
0    1.5
1    1.5
2    5.0
dtype: float64

Ops against N columns are supported generally:

>>> def f(row):
...     v, w, x, y, z = (
...         row['a'], row['b'], row['c'], row['d'], row['e']
...     )
...     return x + (y - (z / w)) % v
...
>>> df = cudf.DataFrame({
...     'a': [1, 2, 3],
...     'b': [4, 5, 6],
...     'c': [cudf.NA, 4, 4],
...     'd': [8, 7, 8],
...     'e': [7, 1, 6]
... })
>>> df.apply(f, axis=1)
0    <NA>
1     4.8
2     5.0
dtype: float64

UDFs manipulating string data are allowed, as long as they neither modify strings in place nor create new strings. For example, the following UDF is allowed:

>>> def f(row):
...     st = row['str_col']
...     scale = row['scale']
...     if len(st) == 0:
...             return -1
...     elif st.startswith('a'):
...             return 1 - scale
...     elif 'example' in st:
...             return 1 + scale
...     else:
...             return 42
...
>>> df = cudf.DataFrame({
...     'str_col': ['', 'abc', 'some_example'],
...     'scale': [1, 2, 3]
... })
>>> df.apply(f, axis=1)  
0   -1
1   -1
2    4
dtype: int64

However, the following UDF is not allowed since it includes an operation that requires the creation of a new string: a call to the upper method. Methods that are not supported in this manner will raise an AttributeError.

>>> def f(row):
...     st = row['str_col'].upper()
...     return 'ABC' in st
>>> df.apply(f, axis=1)  

For a complete list of supported functions and methods that may be used to manipulate string data, see the UDF guide, <https://docs.rapids.ai/api/cudf/stable/user_guide/guide-to-udfs.html>