Accelerating Third-Party Applications ====================================== The ``CUML_ACCEL_ENABLED`` environment variable lets you GPU-accelerate any Python application that uses ``sklearn``, ``umap``, or ``hdbscan``. Even applications whose code you cannot modify. This is useful for installed CLI tools, applications, and third-party libraries. .. code-block:: console CUML_ACCEL_ENABLED=1 some-third-party-tool [args...] When :ref:`CUML_ACCEL_ENABLED=1 is defined `, `cuml.accel` will be enabled as part of the normal Python interpreter startup, letting you accelerate Python applications without modification This means you do not need access to an application's source code: set the environment variable and the acceleration applies automatically. Example: Embedding Visualization with embedding-atlas ----------------------------------------------------- `embedding-atlas `_ is Apple's open-source tool for interactive visualization of large embedding datasets. Given a text dataset, it computes sentence embeddings, projects them to 2D using `UMAP `_, and launches a browser-based explorer. Install it alongside ``cuml``: .. code-block:: console pip install embedding-atlas Run it on a Hugging Face dataset. The example below uses `TinyStories `_, a dataset of 2M+ short stories: .. code-block:: console # CPU -- UMAP runs on CPU embedding-atlas roneneldan/TinyStories --text text \ --split train --sample 1000000 # GPU -- set environment variable; no other changes needed CUML_ACCEL_ENABLED=1 embedding-atlas roneneldan/TinyStories --text text \ --split train --sample 1000000 The only change between the two commands is the environment variable. ``embedding-atlas`` computes embeddings with sentence-transformers (which already uses the GPU), then runs UMAP for dimensionality reduction. ``cuml.accel`` intercepts the ``umap.UMAP`` call inside ``embedding-atlas`` and dispatches ``fit_transform`` to cuML's GPU implementation. Use a smaller ``--sample`` value (e.g. 250000) for a quicker test run. The UMAP speedup grows with dataset size. To confirm GPU dispatch, add ``CUML_ACCEL_LOG_LEVEL=info``: .. code-block:: console CUML_ACCEL_ENABLED=1 CUML_ACCEL_LOG_LEVEL=info embedding-atlas \ roneneldan/TinyStories --text text --split train --sample 1000000 You should see the following messages amongst the other output: .. code-block:: text [cuml.accel] Accelerator installed. [cuml.accel] `UMAP.fit_transform` ran on GPU Results ~~~~~~~ At the time of writing and on the hardware the author used the ``fit_transform`` step saw a roughly **~4x speedup** because cuML's GPU UMAP replaces the CPU optimization. The KNN step (``nearest_neighbors``) is a standalone function call that ``cuml.accel`` does not currently intercept, so it runs on CPU in both cases. Despite this, the overall UMAP step is still **~2x faster**. At smaller scales (< 100K rows) the UMAP step is already fast on CPU and the speedup is less pronounced. The benefit grows with dataset size. Identifying Acceleratable Applications --------------------------------------- Any Python tool that calls one of the following is a candidate for ``CUML_ACCEL_ENABLED``: - ``sklearn`` estimators (KMeans, PCA, DBSCAN, RandomForest, LogisticRegression, NearestNeighbors, and :doc:`many more <../faq>`) - ``umap.UMAP`` - ``hdbscan.HDBSCAN`` A quick way to check: search an application's dependencies for ``scikit-learn``, ``umap-learn``, or ``hdbscan``, or run with ``CUML_ACCEL_LOG_LEVEL=info`` and look for ``ran on GPU`` messages in the output. Checking for CPU Fallbacks -------------------------- Not all parameter combinations are supported on the GPU. When ``cuml.accel`` encounters an unsupported configuration, it silently falls back to CPU execution. To detect this, set the log level to ``info`` or ``debug``: .. code-block:: console CUML_ACCEL_ENABLED=1 CUML_ACCEL_LOG_LEVEL=info python app.py Lines containing ``ran on GPU`` confirm GPU execution. Lines containing ``falling back to CPU`` indicate a fallback, along with the reason. See :doc:`../logging-and-profiling` for more detail.