Accelerating Third-Party Applications#
The CUML_ACCEL_ENABLED environment variable lets you GPU-accelerate any
Python application that uses sklearn, umap, or hdbscan.
Even applications whose code you cannot modify. This is useful for
installed CLI tools, applications, and third-party libraries.
CUML_ACCEL_ENABLED=1 some-third-party-tool [args...]
When CUML_ACCEL_ENABLED=1 is defined,
cuml.accel will be enabled as part of the normal Python interpreter
startup, letting you accelerate Python applications without modification
This means you do not need access to an application’s source code: set the environment variable and the acceleration applies automatically.
Example: Embedding Visualization with embedding-atlas#
embedding-atlas is Apple’s open-source tool for interactive visualization of large embedding datasets. Given a text dataset, it computes sentence embeddings, projects them to 2D using UMAP, and launches a browser-based explorer.
Install it alongside cuml:
pip install embedding-atlas
Run it on a Hugging Face dataset. The example below uses TinyStories, a dataset of 2M+ short stories:
# CPU -- UMAP runs on CPU
embedding-atlas roneneldan/TinyStories --text text \
--split train --sample 1000000
# GPU -- set environment variable; no other changes needed
CUML_ACCEL_ENABLED=1 embedding-atlas roneneldan/TinyStories --text text \
--split train --sample 1000000
The only change between the two commands is the environment variable.
embedding-atlas computes embeddings with sentence-transformers (which
already uses the GPU), then runs UMAP for dimensionality reduction.
cuml.accel intercepts the umap.UMAP call inside embedding-atlas
and dispatches fit_transform to cuML’s GPU implementation.
Use a smaller --sample value (e.g. 250000) for a quicker test run.
The UMAP speedup grows with dataset size.
To confirm GPU dispatch, add CUML_ACCEL_LOG_LEVEL=info:
CUML_ACCEL_ENABLED=1 CUML_ACCEL_LOG_LEVEL=info embedding-atlas \
roneneldan/TinyStories --text text --split train --sample 1000000
You should see the following messages amongst the other output:
[cuml.accel] Accelerator installed.
[cuml.accel] `UMAP.fit_transform` ran on GPU
Results#
At the time of writing and on the hardware the author used the
fit_transform step saw a roughly ~4x speedup because cuML’s GPU
UMAP replaces the CPU optimization. The KNN step (nearest_neighbors)
is a standalone function call that cuml.accel does not currently
intercept, so it runs on CPU in both cases. Despite this, the overall
UMAP step is still ~2x faster.
At smaller scales (< 100K rows) the UMAP step is already fast on CPU and the speedup is less pronounced. The benefit grows with dataset size.
Identifying Acceleratable Applications#
Any Python tool that calls one of the following is a candidate for
CUML_ACCEL_ENABLED:
sklearnestimators (KMeans, PCA, DBSCAN, RandomForest, LogisticRegression, NearestNeighbors, and many more)umap.UMAPhdbscan.HDBSCAN
A quick way to check: search an application’s dependencies for
scikit-learn, umap-learn, or hdbscan, or run with
CUML_ACCEL_LOG_LEVEL=info and look for ran on GPU messages
in the output.
Checking for CPU Fallbacks#
Not all parameter combinations are supported on the GPU. When
cuml.accel encounters an unsupported configuration, it silently
falls back to CPU execution. To detect this, set the log level to
info or debug:
CUML_ACCEL_ENABLED=1 CUML_ACCEL_LOG_LEVEL=info python app.py
Lines containing ran on GPU confirm GPU execution. Lines
containing falling back to CPU indicate a fallback, along with
the reason. See Logging and Profiling for more detail.