Default engine="gpu"#
.collect(engine="gpu") (and engine=pl.GPUEngine()) is the API you invoke when you don’t
construct a streaming engine explicitly. It runs the same streaming executor as the explicit
engines (Ray, Dask, SPMD), conceptually similar to
Polars’ own streaming engine but on the
GPU. Under the hood it’s backed by DefaultSingletonEngine,
a process-wide singleton specialization of SPMDEngine. At most one live
instance exists per process, which is created lazily on first use and torn down at interpreter
exit. Ray is the showcased explicit engine (see Usage); this page documents what
engine="gpu" does without you having to construct anything.
Important
engine="gpu" is meant for trivial setup: single-GPU execution with no
configuration or engine object to manage.
For any non-trivial workflow, construct an engine explicitly. To tune
options, use
RayEngine.from_options(...).
engine="gpu" accepts no options, so settings such as
spill_to_pinned_memory=True for spill-heavy workloads require an
explicit engine. See Usage and Configuration Options.
What you get without an explicit engine#
When you just write:
import polars as pl
result = (
pl.scan_parquet("/data/*.parquet")
.group_by("customer_id")
.agg(pl.col("amount").sum())
.collect(engine="gpu")
)
cudf-polars uses
DefaultSingletonEngine
under the hood. No cluster is set up, the rapidsmpf Context is bootstrapped on first use,
and subsequent .collect() calls in the same process reuse it.
Explicit handle#
If you genuinely want the singleton (for example in tests or scripts that need to call
.shutdown() deterministically) you can obtain it via the factory:
from cudf_polars.engine.default_singleton_engine import (
DefaultSingletonEngine,
)
engine = DefaultSingletonEngine.get_or_create()
result = query.collect(engine=engine)
get_or_create() is idempotent: calling it again returns the same instance.
For anything beyond defaults, prefer an explicit engine. See Usage.
Lifecycle#
The singleton is bootstrapped once per process. The rapidsmpf Context, RMM adaptor, and
Python thread-pool executor are reused across every .collect() call.
Shutdown is automatic: the engine registers an atexit hook that tears it down at interpreter
exit. To shut it down explicitly (for example to release resources before constructing a
multi-GPU engine), call the static method:
from cudf_polars.engine.default_singleton_engine import (
DefaultSingletonEngine,
)
DefaultSingletonEngine.shutdown()
shutdown() is idempotent (calling it twice is safe) and a no-op if no live engine exists.
Mutual exclusion with explicit engines#
DefaultSingletonEngine, RayEngine,
DaskEngine, and
SPMDEngine cannot coexist in the same
process. Concretely:
Constructing
RayEngine/DaskEngine/SPMDEnginewhile the singleton is alive raisesRuntimeError.DefaultSingletonEngine.get_or_create()raisesRuntimeErrorif any explicit streaming engine is alive.
Recommended pattern: pick one engine for the lifetime of the program. If you need to switch, shut down the active engine first:
DefaultSingletonEngine.shutdown()
explicit_engine = SPMDEngine.from_options(opts)
No options#
DefaultSingletonEngine.get_or_create() takes no arguments. To tune StreamingOptions such
as spill_to_pinned_memory, fallback_mode, max_rows_per_partition, or any rapidsmpf
runtime knob, construct an explicit
RayEngine via
RayEngine.from_options(...).
See Configuration Options for the available fields.