Logging and Profiling
=====================

``cuml.accel`` provides logging and profiling support to help you understand
which operations are being accelerated on GPU and which are falling back to CPU
execution. This can be particularly useful for debugging performance issues or
understanding why certain operations might not be accelerated.

Logging
^^^^^^^

The logging system provides different levels of detail:

* **WARN level**: Shows only warnings and errors. The default level.
* **INFO level**: Shows information on where dispatched methods ran (GPU or CPU) and why.
* **DEBUG level**: Shows more detailed information about GPU initialization,
  parameter synchronization, attribute synchronization, and all method calls.

To enable logging, you can set the logging level in several ways:

Command Line Interface (CLI)
----------------------------

When running scripts with ``cuml.accel``, you can use the ``-v`` or ``--verbose`` flag:

.. code-block:: console

   # Show warnings only (default)
   python -m cuml.accel myscript.py

   # Show info level logs
   python -m cuml.accel -v myscript.py

   # Show debug level logs (most verbose)
   python -m cuml.accel -vv myscript.py

Programmatic Installation
-------------------------

When using the programmatic installation method, you can set the log level directly:

.. code-block:: python

   import cuml

   # Install with debug logging
   cuml.accel.install(log_level="debug")

   # Install with info logging
   cuml.accel.install(log_level="info")

   # Install with warning logging (default)
   cuml.accel.install(log_level="warn")

Jupyter Notebooks
-----------------

Since the magic command doesn't accept arguments, use the programmatic installation:

.. code-block:: python

   import cuml

   # Install with desired log level before other imports
   cuml.accel.install(log_level="debug")

Example
-------

.. code-block:: python

   from sklearn.linear_model import Ridge
   from sklearn.datasets import make_regression

   X, y = make_regression()

   # Fit and predict on GPU
   ridge = Ridge(alpha=1.0)
   ridge.fit(X, y)
   ridge.predict(X)

   # Retry, using a hyperparameter that isn't supported on GPU
   ridge = Ridge(positive=True)
   ridge.fit(X, y)
   ridge.predict(X)


Executing this with ``python -m cuml.accel -v script.py`` will show the following output:

.. code-block:: console

   [cuml.accel] Accelerator installed.
   [cuml.accel] `Ridge.fit` ran on GPU
   [cuml.accel] `Ridge.predict` ran on GPU
   [cuml.accel] `Ridge.fit` falling back to CPU: `positive=True` is not supported
   [cuml.accel] `Ridge.fit` ran on CPU
   [cuml.accel] `Ridge.predict` ran on CPU

This logging information can help you:

* Identify which parts of your pipeline are being accelerated
* Understand why certain operations fall back to CPU
* Debug performance issues by seeing where GPU acceleration fails
* Optimize your code by understanding synchronization patterns

Profiling
^^^^^^^^^

In addition to logging, ``cuml.accel`` contains two profilers to help users better
understand what parts of their code ``cuml.accel`` was able to accelerate.

Function Profiler
-----------------

The function profiler gathers statistics about potentially accelerated function
and method calls. It can show:

- Which method calls ``cuml.accel`` had the potential to accelerate (if any).
  Note that only methods ``cuml.accel`` can currently accelerate are included
  in this table (even if a CPU fallback was required). Methods that are fully
  unimplemented won't be present.
- Which methods were accelerated on GPU, and their total runtime.
- Which methods required a CPU fallback, their total runtime, and why a fallback was needed

It can be enabled in a few different ways:

**Command Line Interface (CLI)**

If running using the CLI, you may add the ``--profile`` flag to profile your
whole script.

.. code-block:: console

    python -m cuml.accel --profile script.py

**Jupyter Notebook**

If running in IPython or Jupyter, you may use the ``cuml.accel.profile`` cell
magic to profile code running in a single cell.

.. code-block:: python

   %%cuml.accel.profile

   # All code in this cell will be profiled
   ...

**Programmatic Usage**

Alternatively, the ``cuml.accel.profile`` contextmanager may be used to
programmatically profile a section of code.

.. code-block:: python

    with cuml.accel.profile():
        # All code within this context will be profiled
        ...

In all cases, once the profiler's context ends, a report will be generated.

For example, running the following script:

.. code-block:: python

   from sklearn.linear_model import Ridge
   from sklearn.datasets import make_regression

   X, y = make_regression(n_samples=100)

   # Fit and predict on GPU
   ridge = Ridge(alpha=1.0)
   ridge.fit(X, y)
   ridge.predict(X)

   # Retry, using a hyperparameter that isn't supported on GPU
   ridge = Ridge(positive=True)
   ridge.fit(X, y)
   ridge.predict(X)

as ``python -m cuml.accel --profile script.py`` will output the following report

.. code-block:: text

    cuml.accel profile
    ┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┓
    ┃ Function      ┃ GPU calls ┃ GPU time ┃ CPU calls ┃ CPU time ┃
    ┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━┩
    │ Ridge.fit     │         1 │    171ms │         1 │    4.7ms │
    │ Ridge.predict │         1 │    1.2ms │         1 │   89.8µs │
    ├───────────────┼───────────┼──────────┼───────────┼──────────┤
    │ Total         │         2 │  172.2ms │         2 │    4.8ms │
    └───────────────┴───────────┴──────────┴───────────┴──────────┘
    Not all operations ran on the GPU. The following functions required CPU
    fallback for the following reasons:

    * Ridge.fit
      - `positive=True` is not supported
    * Ridge.predict
      - Estimator not fit on GPU

From this you can see that:

- The only methods ``cuml.accel`` had the potential to accelerate were
  ``Ridge.fit`` and ``Ridge.predict``.
- Each method was called 2 times - once on GPU and once on CPU
- The reason the CPU callback was required was that ``positive=True`` wasn't
  supported.

Line Profiler
-------------

The line profiler collects per-line statistics on your script. It can show:

- Which lines took the most cumulative time.
- Which lines (if any) were able to benefit from acceleration.
- The percentage of each line's runtime that was spent on GPU through ``cuml.accel``.

.. warning::

   The line profiler can add non-negligible overhead. It can be useful to
   gather information on what parts of your code were accelerated, but you
   shouldn't compare runtimes when run with the line profiler enabled to
   other runs.

**Command Line Interface (CLI)**

If running using the CLI, you may add the ``--line-profile`` flag to run the
line profiler on your whole script.

.. code-block:: console

    python -m cuml.accel --line-profile script.py

**Jupyter Notebook**

If running in IPython or Jupyter, you may use the ``cuml.accel.line_profile``
cell magic to run the line profiler on code in a single cell.

.. code-block:: python

   %%cuml.accel.line_profile

   # All code in this cell will be profiled
   ...

In all cases, once the profiler's context ends, a report will be generated.

For example, running the following script:

.. code-block:: python

   from sklearn.linear_model import Ridge
   from sklearn.datasets import make_regression

   X, y = make_regression(n_samples=100)

   # Fit and predict on GPU
   ridge = Ridge(alpha=1.0)
   ridge.fit(X, y)
   ridge.predict(X)

   # Retry, using a hyperparameter that isn't supported on GPU
   ridge = Ridge(positive=True)
   ridge.fit(X, y)
   ridge.predict(X)

as ``python -m cuml.accel --line-profile script.py`` will output the following report

.. code-block:: text

    cuml.accel line profile
    ┏━━━━┳━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
    ┃  # ┃ N ┃    Time ┃ GPU % ┃ Source                                       ┃
    ┡━━━━╇━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
    │  1 │ 1 │       - │     - │ from sklearn.linear_model import Ridge       │
    │  2 │ 1 │       - │     - │ from sklearn.datasets import make_regression │
    │  3 │   │         │       │                                              │
    │  4 │ 1 │   1.5ms │     - │ X, y = make_regression(n_samples=100)        │
    │  5 │   │         │       │                                              │
    │  6 │   │         │       │ # Fit and predict on GPU                     │
    │  7 │ 1 │       - │     - │ ridge = Ridge(alpha=1.0)                     │
    │  8 │ 1 │ 158.4ms │  99.0 │ ridge.fit(X, y)                              │
    │  9 │ 1 │   1.4ms │  97.0 │ ridge.predict(X)                             │
    │ 10 │   │         │       │                                              │
    │ 11 │   │         │       │ # Retry, using an unsupported hyperparameter │
    │ 12 │ 1 │       - │     - │ ridge = Ridge(positive=True)                 │
    │ 13 │ 1 │   6.3ms │   0.0 │ ridge.fit(X, y)                              │
    │ 14 │ 1 │ 153.4µs │   0.0 │ ridge.predict(X)                             │
    └────┴───┴─────────┴───────┴──────────────────────────────────────────────┘
    Ran in 168.3ms, 94.5% on GPU


From this you can see that:

- The first calls to ``Ridge.fit`` and ``Ridge.predict`` (lines 8 and 9) ran on
  GPU, while the latter calls to these same methods (lines 13 and 14) fell back
  to CPU. No other lines had the opportunity for GPU acceleration.

- The script ran in 168.3 ms, 94.5% of which was spent on GPU method calls.
  High percentages here *may* indicate a good fit for ``cuml.accel``, as a
  majority of the total time is spent in accelerated methods. However, even a
  low percentage here may still be worthwhile if the total time taken by the
  script is reduced compared to running without ``cuml.accel``.

- The time taken by the GPU accelerated calls is *much higher* than the time
  taken by the equivalent CPU calls. This is because we're running on very
  small data here, where the overhead of CPU <-> GPU transfer dominates the
  runtime. So while we had a high percentage of utilization (usually good!),
  the runtimes indicate that this particular script may be better run without
  ``cuml.accel``. For this estimator, acceleration does become more beneficial
  once run on larger problems (try increasing to ``n_samples=10_000``).