cudf.DataFrame.apply_chunks#

DataFrame.apply_chunks(func, incols, outcols, kwargs=None, pessimistic_nulls=True, chunks=None, blkct=None, tpb=None)#

Transform user-specified chunks using the user-provided function.

Parameters:
dfDataFrame

The source dataframe.

funcfunction

The transformation function that will be executed on the CUDA GPU.

incols: list or dict

A list of names of input columns that match the function arguments. Or, a dictionary mapping input column names to their corresponding function arguments such as {‘col1’: ‘arg1’}.

outcols: dict

A dictionary of output column names and their dtype.

kwargs: dict

name-value of extra arguments. These values are passed directly into the function.

pessimistic_nullsbool

Whether or not apply_rows output should be null when any corresponding input is null. If False, all outputs will be non-null, but will be the result of applying func against the underlying column data, which may be garbage.

chunksint or Series-like

If it is an int, it is the chunksize. If it is an array, it contains integer offset for the start of each chunk. The span of a chunk for chunk i-th is data[chunks[i] : chunks[i + 1]] for any i + 1 < chunks.size; or, data[chunks[i]:] for the i == len(chunks) - 1.

tpbint; optional

The threads-per-block for the underlying kernel. If not specified (Default), uses Numba .forall(...) built-in to query the CUDA Driver API to determine optimal kernel launch configuration. Specify 1 to emulate serial execution for each chunk. It is a good starting point but inefficient. Its maximum possible value is limited by the available CUDA GPU resources.

blkctint; optional

The number of blocks for the underlying kernel. If not specified (Default) and tpb is not specified (Default), uses Numba .forall(...) built-in to query the CUDA Driver API to determine optimal kernel launch configuration. If not specified (Default) and tpb is specified, uses chunks as the number of blocks.

Examples

For tpb > 1, func is executed by tpb number of threads concurrently. To access the thread id and count, use numba.cuda.threadIdx.x and numba.cuda.blockDim.x, respectively (See numba CUDA kernel documentation).

In the example below, the kernel is invoked concurrently on each specified chunk. The kernel computes the corresponding output for the chunk.

By looping over the range range(cuda.threadIdx.x, in1.size, cuda.blockDim.x), the kernel function can be used with any tpb in an efficient manner.

>>> from numba import cuda
>>> @cuda.jit
... def kernel(in1, in2, in3, out1):
...      for i in range(cuda.threadIdx.x, in1.size, cuda.blockDim.x):
...          x = in1[i]
...          y = in2[i]
...          z = in3[i]
...          out1[i] = x * y + z