GPUEngine Configuration Options#

The polars.GPUEngine object may be configured in several different ways.

Executor#

cudf-polars includes multiple executors, backends that take a Polars query and execute it to produce the result (either an in-memory polars.DataFrame from .collect() or one or more files with .sink_<method>). These can be specified with the executor option when you create the GPUEngine.

import polars as pl

engine = pl.GPUEngine(executor="streaming")
query = ...

result = query.collect(engine=engine)

The streaming executor is the default executor as of RAPIDS 25.08, and is equivalent to passing engine="gpu" or engine=pl.GPUEngine() to collect. At a high-level, the streaming executor works by breaking inputs (in-memory DataFrames or parquet files) into multiple pieces and streaming those pieces through the series of operations needed to produce the final result.

We also provide an in-memory executor. This executor is often faster when the underlying data fits comfortably in device memory, because the overhead of splitting inputs and executing them in batches is less beneficial at this scale. With that said, this executor must rely on Unified Virtual Memory (UVM) if the input and intermediate data do not fit in device memory. The in-memory executor can be used with

engine = pl.GPUEngine(executor="in-memory")

In general, we recommend starting with the default streaming executor, because it scales significantly better than in-memory. The streaming executor includes several configuration options, which can be provided with the executor_options key when constructing the GPUEngine:

engine = pl.GPUEngine(
    executor="streaming",  # the default
    executor_options={
        "max_rows_per_partition": 500_000,
    }
)

You can configure the default value for configuration options through environment variables with the prefix CUDF_POLARS__EXECUTOR__{option_name}. For example, the environment variable CUDF_POLARS__EXECUTOR__MAX_ROWS_PER_PARTITION will set the default max_rows_per_partition to use if it isn’t overridden through executor_options.

For boolean options, like rapidsmpf_spill, the values {"1", "true", "yes", "y"} are considered True and {"0", "false", "no", "n"} are considered False.

See Configuration Reference for a full list of options, and Streaming Execution for more on the streaming executor, including multi-GPU execution.

Parquet Reader Options#

Reading large parquet files can use a large amount of memory, especially when the files are compressed. This may lead to out of memory errors for some workflows. To mitigate this, the “chunked” parquet reader may be selected. When enabled, parquet files are read in chunks, limiting the peak memory usage at the cost of a small drop in performance.

To configure the parquet reader, we provide a dictionary of options to the parquet_options keyword of the GPUEngine object. Valid keys and values are:

  • chunked indicates that chunked parquet reading is to be used. By default, chunked reading is turned on.

  • chunk_read_limit controls the maximum size per chunk. By default, the maximum chunk size is unlimited.

  • pass_read_limit controls the maximum memory used for decompression. The default pass read limit is 16GiB.

For example, to select the chunked reader with custom values for pass_read_limit and chunk_read_limit:

engine = GPUEngine(
    parquet_options={
        'chunked': True,
        'chunk_read_limit': int(1e9),
        'pass_read_limit': int(4e9)
    }
)
result = query.collect(engine=engine)

Note that passing chunked: False disables chunked reading entirely, and thus chunk_read_limit and pass_read_limit will have no effect.

You can configure the default value for configuration options through environment variables with the prefix CUDF_POLARS__PARQUET_OPTIONS__{option_name}. For example, the environment variable CUDF_POLARS__PARQUET_OPTIONS__CHUNKED=0 will set the default chunked to False.

Disabling CUDA Managed Memory#

By default the in-memory executor will use CUDA managed memory with RMM’s pool allocator. On systems that don’t support managed memory, a non-managed asynchronous pool allocator is used. Managed memory can be turned off by setting POLARS_GPU_ENABLE_CUDA_MANAGED_MEMORY to 0. System requirements for managed memory can be found here.