cudf.Series.apply#

Series.apply(func, convert_dtype=True, args=(), by_row: Literal[False, 'compat'] = 'compat', **kwargs)[source]#

Apply a scalar function to the values of a Series. Similar to pandas.Series.apply.

apply relies on Numba to JIT compile func. Thus the allowed operations within func are limited to those supported by the CUDA Python Numba target. For more information, see the cuDF guide to user defined functions.

Some string functions and methods are supported. Refer to the guide to UDFs for details.

Parameters:
funcfunction

Scalar Python function to apply.

convert_dtypebool, default True

In cuDF, this parameter is always True. Because cuDF does not support arbitrary object dtypes, the result will always be the common type as determined by numba based on the function logic and argument types. See examples for details.

argstuple

Positional arguments passed to func after the series value.

by_rowFalse or “compat”, default “compat”

If "compat" and func is a callable, func will be passed each element of the Series, like Series.map. If func is a list or dict of callables, will first try to translate each func into pandas methods. If that doesn’t work, will try call to apply again with by_row="compat" and if that fails, will call apply again with by_row=False (backward compatible). If False, the func will be passed the whole Series at once.

by_row has no effect when func is a string.

Currently not implemented.

**kwargs

Not supported

Returns:
resultSeries

The mask and index are preserved.

Notes

UDFs are cached in memory to avoid recompilation. The first call to the UDF will incur compilation overhead. func may call nested functions that are decorated with the decorator numba.cuda.jit(device=True), otherwise numba will raise a typing error.

Examples

Apply a basic function to a series:

>>> sr = cudf.Series([1,2,3])
>>> def f(x):
...     return x + 1
>>> sr.apply(f)
0    2
1    3
2    4
dtype: int64

Apply a basic function to a series with nulls:

>>> sr = cudf.Series([1,cudf.NA,3])
>>> def f(x):
...     return x + 1
>>> sr.apply(f)
0       2
1    <NA>
2       4
dtype: int64

Use a function that does something conditionally, based on if the value is or is not null:

>>> sr = cudf.Series([1,cudf.NA,3])
>>> def f(x):
...     if x is cudf.NA:
...         return 42
...     else:
...         return x - 1
>>> sr.apply(f)
0     0
1    42
2     2
dtype: int64

Results will be upcast to the common dtype required as derived from the UDFs logic. Note that this means the common type will be returned even if such data is passed that would not result in any values of that dtype:

>>> sr = cudf.Series([1,cudf.NA,3])
>>> def f(x):
...     return x + 1.5
>>> sr.apply(f)
0     2.5
1    <NA>
2     4.5
dtype: float64

UDFs manipulating string data are allowed, as long as they neither modify strings in place nor create new strings. For example, the following UDF is allowed:

>>> def f(st):
...     if len(st) == 0:
...             return -1
...     elif st.startswith('a'):
...             return 1
...     elif 'example' in st:
...             return 2
...     else:
...             return 3
...
>>> sr = cudf.Series(['', 'abc', 'some_example'])
>>> sr.apply(f)  
0   -1
1    1
2    2
dtype: int64

However, the following UDF is not allowed since it includes an operation that requires the creation of a new string: a call to the upper method. Methods that are not supported in this manner will raise an AttributeError.

>>> def f(st):
...     new = st.upper()
...     return 'ABC' in new
...
>>> sr.apply(f)  

For a complete list of supported functions and methods that may be used to manipulate string data, see the UDF guide, <https://docs.rapids.ai/api/cudf/stable/user_guide/guide-to-udfs.html>