Logging and Profiling#

cuml.accel provides logging and profiling support to help you understand which operations are being accelerated on GPU and which are falling back to CPU execution. This can be particularly useful for debugging performance issues or understanding why certain operations might not be accelerated.

Logging#

The logging system provides different levels of detail:

  • WARN level: Shows only warnings and errors. The default level.

  • INFO level: Shows information on where dispatched methods ran (GPU or CPU) and why.

  • DEBUG level: Shows more detailed information about GPU initialization, parameter synchronization, attribute synchronization, and all method calls.

To enable logging, you can set the logging level in several ways:

Command Line Interface (CLI)#

When running scripts with cuml.accel, you can use the -v or --verbose flag:

# Show warnings only (default)
python -m cuml.accel myscript.py

# Show info level logs
python -m cuml.accel -v myscript.py

# Show debug level logs (most verbose)
python -m cuml.accel -vv myscript.py

Programmatic Installation#

When using the programmatic installation method, you can set the log level directly:

import cuml

# Install with debug logging
cuml.accel.install(log_level="debug")

# Install with info logging
cuml.accel.install(log_level="info")

# Install with warning logging (default)
cuml.accel.install(log_level="warn")

Jupyter Notebooks#

Since the magic command doesn’t accept arguments, use the programmatic installation:

import cuml

# Install with desired log level before other imports
cuml.accel.install(log_level="debug")

Example#

from sklearn.linear_model import Ridge
from sklearn.datasets import make_regression

X, y = make_regression()

# Fit and predict on GPU
ridge = Ridge(alpha=1.0)
ridge.fit(X, y)
ridge.predict(X)

# Retry, using a hyperparameter that isn't supported on GPU
ridge = Ridge(positive=True)
ridge.fit(X, y)
ridge.predict(X)

Executing this with python -m cuml.accel -v script.py will show the following output:

[cuml.accel] Accelerator installed.
[cuml.accel] `Ridge.fit` ran on GPU
[cuml.accel] `Ridge.predict` ran on GPU
[cuml.accel] `Ridge.fit` falling back to CPU: `positive=True` is not supported
[cuml.accel] `Ridge.fit` ran on CPU
[cuml.accel] `Ridge.predict` ran on CPU

This logging information can help you:

  • Identify which parts of your pipeline are being accelerated

  • Understand why certain operations fall back to CPU

  • Debug performance issues by seeing where GPU acceleration fails

  • Optimize your code by understanding synchronization patterns

Profiling#

In addition to logging, cuml.accel contains two profilers to help users better understand what parts of their code cuml.accel was able to accelerate.

Function Profiler#

The function profiler gathers statistics about potentially accelerated function and method calls. It can show:

  • Which method calls cuml.accel had the potential to accelerate (if any). Note that only methods cuml.accel can currently accelerate are included in this table (even if a CPU fallback was required). Methods that are fully unimplemented won’t be present.

  • Which methods were accelerated on GPU, and their total runtime.

  • Which methods required a CPU fallback, their total runtime, and why a fallback was needed

It can be enabled in a few different ways:

Command Line Interface (CLI)

If running using the CLI, you may add the --profile flag to profile your whole script.

python -m cuml.accel --profile script.py

Jupyter Notebook

If running in IPython or Jupyter, you may use the cuml.accel.profile cell magic to profile code running in a single cell.

%%cuml.accel.profile

# All code in this cell will be profiled
...

Programmatic Usage

Alternatively, the cuml.accel.profile contextmanager may be used to programmatically profile a section of code.

with cuml.accel.profile():
    # All code within this context will be profiled
    ...

In all cases, once the profiler’s context ends, a report will be generated.

For example, running the following script:

from sklearn.linear_model import Ridge
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=100)

# Fit and predict on GPU
ridge = Ridge(alpha=1.0)
ridge.fit(X, y)
ridge.predict(X)

# Retry, using a hyperparameter that isn't supported on GPU
ridge = Ridge(positive=True)
ridge.fit(X, y)
ridge.predict(X)

as python -m cuml.accel --profile script.py will output the following report

cuml.accel profile
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Function      ┃ GPU calls ┃ GPU time ┃ CPU calls ┃ CPU time ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━┩
│ Ridge.fit     │         1 │    171ms │         1 │    4.7ms │
│ Ridge.predict │         1 │    1.2ms │         1 │   89.8µs │
├───────────────┼───────────┼──────────┼───────────┼──────────┤
│ Total         │         2 │  172.2ms │         2 │    4.8ms │
└───────────────┴───────────┴──────────┴───────────┴──────────┘
Not all operations ran on the GPU. The following functions required CPU
fallback for the following reasons:

* Ridge.fit
  - `positive=True` is not supported
* Ridge.predict
  - Estimator not fit on GPU

From this you can see that:

  • The only methods cuml.accel had the potential to accelerate were Ridge.fit and Ridge.predict.

  • Each method was called 2 times - once on GPU and once on CPU

  • The reason the CPU callback was required was that positive=True wasn’t supported.

Line Profiler#

The line profiler collects per-line statistics on your script. It can show:

  • Which lines took the most cumulative time.

  • Which lines (if any) were able to benefit from acceleration.

  • The percentage of each line’s runtime that was spent on GPU through cuml.accel.

Warning

The line profiler can add non-negligible overhead. It can be useful to gather information on what parts of your code were accelerated, but you shouldn’t compare runtimes when run with the line profiler enabled to other runs.

Command Line Interface (CLI)

If running using the CLI, you may add the --line-profile flag to run the line profiler on your whole script.

python -m cuml.accel --line-profile script.py

Jupyter Notebook

If running in IPython or Jupyter, you may use the cuml.accel.line_profile cell magic to run the line profiler on code in a single cell.

%%cuml.accel.line_profile

# All code in this cell will be profiled
...

In all cases, once the profiler’s context ends, a report will be generated.

For example, running the following script:

from sklearn.linear_model import Ridge
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=100)

# Fit and predict on GPU
ridge = Ridge(alpha=1.0)
ridge.fit(X, y)
ridge.predict(X)

# Retry, using a hyperparameter that isn't supported on GPU
ridge = Ridge(positive=True)
ridge.fit(X, y)
ridge.predict(X)

as python -m cuml.accel --line-profile script.py will output the following report

cuml.accel line profile
┏━━━━┳━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃  # ┃ N ┃    Time ┃ GPU % ┃ Source                                       ┃
┡━━━━╇━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│  1 │ 1 │       - │     - │ from sklearn.linear_model import Ridge       │
│  2 │ 1 │       - │     - │ from sklearn.datasets import make_regression │
│  3 │   │         │       │                                              │
│  4 │ 1 │   1.5ms │     - │ X, y = make_regression(n_samples=100)        │
│  5 │   │         │       │                                              │
│  6 │   │         │       │ # Fit and predict on GPU                     │
│  7 │ 1 │       - │     - │ ridge = Ridge(alpha=1.0)                     │
│  8 │ 1 │ 158.4ms │  99.0 │ ridge.fit(X, y)                              │
│  9 │ 1 │   1.4ms │  97.0 │ ridge.predict(X)                             │
│ 10 │   │         │       │                                              │
│ 11 │   │         │       │ # Retry, using an unsupported hyperparameter │
│ 12 │ 1 │       - │     - │ ridge = Ridge(positive=True)                 │
│ 13 │ 1 │   6.3ms │   0.0 │ ridge.fit(X, y)                              │
│ 14 │ 1 │ 153.4µs │   0.0 │ ridge.predict(X)                             │
└────┴───┴─────────┴───────┴──────────────────────────────────────────────┘
Ran in 168.3ms, 94.5% on GPU

From this you can see that:

  • The first calls to Ridge.fit and Ridge.predict (lines 8 and 9) ran on GPU, while the latter calls to these same methods (lines 13 and 14) fell back to CPU. No other lines had the opportunity for GPU acceleration.

  • The script ran in 168.3 ms, 94.5% of which was spent on GPU method calls. High percentages here may indicate a good fit for cuml.accel, as a majority of the total time is spent in accelerated methods. However, even a low percentage here may still be worthwhile if the total time taken by the script is reduced compared to running without cuml.accel.

  • The time taken by the GPU accelerated calls is much higher than the time taken by the equivalent CPU calls. This is because we’re running on very small data here, where the overhead of CPU <-> GPU transfer dominates the runtime. So while we had a high percentage of utilization (usually good!), the runtimes indicate that this particular script may be better run without cuml.accel. For this estimator, acceleration does become more beneficial once run on larger problems (try increasing to n_samples=10_000).