{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Performance Profiling and Debugging with cuml.accel\n",
        "\n",
        "This notebook demonstrates how to use the profiling capabilities in `cuml.accel` to understand which operations are being accelerated on GPU and which are falling back to CPU execution. This can be particularly useful for debugging performance issues or understanding why certain operations might not be accelerated.\n",
        "\n",
        "`cuml.accel` provides two types of profilers:\n",
        "\n",
        "1. **Function Profiler**: Shows statistics about potentially accelerated function and method calls\n",
        "2. **Line Profiler**: Shows per-line statistics on your script with GPU utilization percentages\n",
        "\n",
        "Let's explore both profilers with practical examples.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup\n",
        "\n",
        "First, let's load the cuml.accel extension and import the necessary libraries.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Load the cuml.accel extension\n",
        "%load_ext cuml.accel\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from sklearn.linear_model import Ridge\n",
        "from sklearn.datasets import make_regression\n",
        "from sklearn.ensemble import RandomForestClassifier\n",
        "from sklearn.model_selection import train_test_split"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Function Profiler\n",
        "\n",
        "The function profiler gathers statistics about potentially accelerated function and method calls. It can show:\n",
        "\n",
        "- Which method calls `cuml.accel` had the potential to accelerate\n",
        "- Which methods were accelerated on GPU, and their total runtime\n",
        "- Which methods required a CPU fallback, their total runtime, and why a fallback was needed\n",
        "\n",
        "### Example 1: Ridge Regression with Mixed GPU/CPU Execution\n",
        "\n",
        "Let's start with a simple example that demonstrates both GPU acceleration and CPU fallback using Ridge regression.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Generate sample data\n",
        "X, y = make_regression(n_samples=1000, n_features=100, noise=0.1, random_state=42)\n",
        "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "%%cuml.accel.profile\n",
        "\n",
        "# Fit and predict on GPU (supported parameters)\n",
        "ridge = Ridge(alpha=1.0)\n",
        "ridge.fit(X_train, y_train)\n",
        "predictions_gpu = ridge.predict(X_test)\n",
        "\n",
        "# Retry, using a hyperparameter that isn't supported on GPU\n",
        "ridge_cpu = Ridge(positive=True)  # positive=True is not supported on GPU\n",
        "ridge_cpu.fit(X_train, y_train)\n",
        "predictions_cpu = ridge_cpu.predict(X_test)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The function profiler output above shows:\n",
        "\n",
        "- **GPU calls**: Methods that ran successfully on GPU\n",
        "- **GPU time**: Total time spent on GPU operations\n",
        "- **CPU calls**: Methods that fell back to CPU execution\n",
        "- **CPU time**: Total time spent on CPU operations\n",
        "- **Fallback reasons**: Why certain operations couldn't run on GPU\n",
        "\n",
        "### Example 2: Random Forest Classification\n",
        "\n",
        "Let's try a more complex example with Random Forest classification.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Generate classification data\n",
        "from sklearn.datasets import make_classification\n",
        "X_class, y_class = make_classification(n_samples=2000, n_features=20, n_informative=15, \n",
        "                                      n_redundant=5, n_classes=3, random_state=42)\n",
        "X_train_class, X_test_class, y_train_class, y_test_class = train_test_split(\n",
        "    X_class, y_class, test_size=0.2, random_state=42)\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "%%cuml.accel.profile\n",
        "\n",
        "# Random Forest with supported parameters\n",
        "rf = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)\n",
        "rf.fit(X_train_class, y_train_class)\n",
        "rf_predictions = rf.predict(X_test_class)\n",
        "rf_probabilities = rf.predict_proba(X_test_class)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Line Profiler\n",
        "\n",
        "The line profiler collects per-line statistics on your script. It can show:\n",
        "\n",
        "- Which lines took the most cumulative time\n",
        "- Which lines (if any) were able to benefit from acceleration\n",
        "- The percentage of each line's runtime that was spent on GPU through `cuml.accel`\n",
        "\n",
        "⚠️ **Warning**: The line profiler can add non-negligible overhead. It's useful for understanding what parts of your code were accelerated, but you shouldn't compare runtimes when run with the line profiler enabled to other runs.\n",
        "\n",
        "### Example 3: Line Profiling with Ridge Regression\n",
        "\n",
        "Let's use the line profiler to see detailed per-line statistics.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "%%cuml.accel.line_profile\n",
        "\n",
        "# Generate data\n",
        "X, y = make_regression(n_samples=1000, n_features=100, noise=0.1, random_state=42)\n",
        "\n",
        "# Fit and predict on GPU\n",
        "ridge = Ridge(alpha=1.0)\n",
        "ridge.fit(X, y)\n",
        "predictions = ridge.predict(X)\n",
        "\n",
        "# Retry, using a hyperparameter that isn't supported on GPU\n",
        "ridge_cpu = Ridge(positive=True)\n",
        "ridge_cpu.fit(X, y)\n",
        "predictions_cpu = ridge_cpu.predict(X)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The line profiler output shows:\n",
        "\n",
        "- **#**: Line number\n",
        "- **N**: Number of times the line was executed\n",
        "- **Time**: Total time spent on that line\n",
        "- **GPU %**: Percentage of time spent on GPU for that line\n",
        "- **Source**: The actual code line\n",
        "\n",
        "At the bottom, you'll see the total runtime and the percentage of time spent on GPU.\n",
        "\n",
        "### Example 4: Line Profiling with Multiple Algorithms\n",
        "\n",
        "Let's try a more comprehensive example with multiple machine learning algorithms.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from sklearn.linear_model import LogisticRegression\n",
        "from sklearn.cluster import KMeans\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "%%cuml.accel.line_profile\n",
        "\n",
        "# Generate data for multiple tasks\n",
        "X_reg, y_reg = make_regression(n_samples=500, n_features=50, noise=0.1, random_state=42)\n",
        "X_class, y_class = make_classification(n_samples=500, n_features=20, n_classes=2, random_state=42)\n",
        "\n",
        "# Regression task\n",
        "ridge = Ridge(alpha=1.0)\n",
        "ridge.fit(X_reg, y_reg)\n",
        "ridge_pred = ridge.predict(X_reg)\n",
        "\n",
        "# Classification task\n",
        "logreg = LogisticRegression(random_state=42)\n",
        "logreg.fit(X_class, y_class)\n",
        "logreg_pred = logreg.predict(X_class)\n",
        "\n",
        "# Clustering task\n",
        "kmeans = KMeans(n_clusters=3, random_state=42)\n",
        "kmeans.fit(X_class)\n",
        "kmeans_pred = kmeans.predict(X_class)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Programmatic Profiling\n",
        "\n",
        "You can also use the profilers programmatically with context managers. This is useful when you want to profile specific sections of code rather than entire cells.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Generate data\n",
        "X, y = make_regression(n_samples=1000, n_features=100, noise=0.1, random_state=42)\n",
        "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Using function profiler programmatically\n",
        "# Note: that requires to import cuml – typically not needed for zero-code-change acceleration\n",
        "import cuml\n",
        "\n",
        "with cuml.accel.profile():\n",
        "    # This section will be profiled\n",
        "    ridge = Ridge(alpha=1.0)\n",
        "    ridge.fit(X_train, y_train)\n",
        "    predictions = ridge.predict(X_test)\n",
        "    \n",
        "    # This will fall back to CPU\n",
        "    ridge_cpu = Ridge(positive=True)\n",
        "    ridge_cpu.fit(X_train, y_train)\n",
        "    predictions_cpu = ridge_cpu.predict(X_test)\n",
        "\n",
        "# This section will NOT be profiled\n",
        "print(\"Profiling complete!\")\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Logging\n",
        "\n",
        "In addition to profiling, `cuml.accel` also provides logging capabilities. You can enable different levels of logging to see what's happening behind the scenes.\n",
        "\n",
        "### Setting Log Levels\n",
        "\n",
        "You can set the logging level when installing cuml.accel programmatically:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Note: This needs to be done before loading the extension\n",
        "# Uncomment and restart kernel to try different log levels\n",
        "\n",
        "# import cuml\n",
        "# cuml.accel.install(log_level=\"debug\")  # Most verbose\n",
        "# cuml.accel.install(log_level=\"info\")   # Shows GPU/CPU dispatch info\n",
        "# cuml.accel.install(log_level=\"warn\")   # Default - warnings only\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Example with Info Logging\n",
        "\n",
        "Let's demonstrate what info-level logging looks like. First, let's reinstall cuml.accel with info logging:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Reinstall with info logging\n",
        "cuml.accel.install(log_level=\"info\")\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Now let's run some code and see the logging output\n",
        "X, y = make_regression(n_samples=100, n_features=10, noise=0.1, random_state=42)\n",
        "\n",
        "# This should run on GPU\n",
        "ridge = Ridge(alpha=1.0)\n",
        "ridge.fit(X, y)\n",
        "ridge.predict(X)\n",
        "\n",
        "# This should fall back to CPU\n",
        "ridge_cpu = Ridge(positive=True)\n",
        "ridge_cpu.fit(X, y)\n",
        "ridge_cpu.predict(X)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Key Takeaways\n",
        "\n",
        "1. **Function Profiler** (`%%cuml.accel.profile`): Best for understanding which methods were accelerated and why some fell back to CPU\n",
        "\n",
        "2. **Line Profiler** (`%%cuml.accel.line_profile`): Best for understanding which specific lines of code benefited from acceleration and the overall GPU utilization percentage\n",
        "\n",
        "3. **Logging**: Useful for real-time feedback on what's happening during execution\n",
        "\n",
        "4. **Performance Insights**: \n",
        "   - High GPU utilization percentages indicate good acceleration\n",
        "   - CPU fallbacks are clearly identified with reasons\n",
        "   - Small datasets may show higher GPU times due to transfer overhead\n",
        "   - Larger datasets typically show better GPU acceleration benefits\n",
        "\n",
        "5. **Debugging**: Use these tools to identify why certain operations aren't being accelerated and optimize your code accordingly.\n",
        "\n",
        "The profiling tools in `cuml.accel` are essential for understanding and optimizing your GPU-accelerated machine learning workflows!\n"
      ]
    }
  ],
  "metadata": {
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}