GPUEngine Configuration Options#

The polars.GPUEngine object may be configured in several different ways.

Parquet Reader Options#

Reading large parquet files can use a large amount of memory, especially when the files are compressed. This may lead to out of memory errors for some workflows. To mitigate this, the “chunked” parquet reader may be selected. When enabled, parquet files are read in chunks, limiting the peak memory usage at the cost of a small drop in performance.

To configure the parquet reader, we provide a dictionary of options to the parquet_options keyword of the GPUEngine object. Valid keys and values are:

  • chunked indicates that chunked parquet reading is to be used. By default, chunked reading is turned on.

  • chunk_read_limit controls the maximum size per chunk. By default, the maximum chunk size is unlimited.

  • pass_read_limit controls the maximum memory used for decompression. The default pass read limit is 16GiB.

For example, to select the chunked reader with custom values for pass_read_limit and chunk_read_limit:

engine = GPUEngine(
    parquet_options={
        'chunked': True,
        'chunk_read_limit': int(1e9),
        'pass_read_limit': int(4e9)
    }
)
result = query.collect(engine=engine)

Note that passing chunked: False disables chunked reading entirely, and thus chunk_read_limit and pass_read_limit will have no effect.

Disabling CUDA Managed Memory#

By default cudf_polars will default to CUDA managed memory with RMM’s pool allocator. On systems that don’t support managed memory, a non-managed asynchronous pool allocator is used. Managed memory can be turned off by setting POLARS_GPU_ENABLE_CUDA_MANAGED_MEMORY to 0. System requirements for managed memory can be found here.