{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# cuML on GPU and CPU\n",
    "\n",
    "cuML is a Scikit-learn-like suite of fast, GPU-accelerated machine learning algorithms designed for data science and analytical tasks.\n",
    "\n",
    "Starting with version 23.10, cuML provides both GPU-based and CPU-based execution capabilities with zero code change required to switch between them. This unified CPU/GPU cuML: \n",
    "\n",
    "- Allows users to prototype in systems without GPUs. \n",
    "- Allows library integrations without the need for dispatching and boilerplate code. \n",
    "- Allows users to train on one type of system and infer with the other for a subset of estimators (that will expand over time). \n",
    "- Provides compatibility with the broader GPU/CPU open source pydata ecosystem.\n",
    "\n",
    "The majority of estimators of cuML can run in both CPU and GPU systems, with a subset of them supporting exporting models between GPU and CPU systems. The following table shows support for the most common estimators: \n",
    "\n",
    "| Category | Algorithm | Supports Execution on CPU | Supports Exporting between CPU and GPU | \n",
    "| --- | --- | --- | --- |\n",
    "| **Clustering** |  Density-Based Spatial Clustering of Applications with Noise (DBSCAN) | Yes | No |\n",
    "|  | Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN)  | Yes | Partial |\n",
    "|  | K-Means | Yes | No |\n",
    "|  | Single-Linkage Agglomerative Clustering | No | No | \n",
    "| **Dimensionality Reduction** | Principal Components Analysis (PCA) | Yes | Yes | \n",
    "| | Incremental PCA | No | No |\n",
    "| | Truncated Singular Value Decomposition (tSVD) | Yes | Yes |\n",
    "| | Uniform Manifold Approximation and Projection (UMAP) | Yes | Partial |\n",
    "| | Random Projection | No | No |\n",
    "| | t-Distributed Stochastic Neighbor Embedding (TSNE) | No | No |\n",
    "| **Linear Models for Regression or Classification** | Linear Regression (OLS) | Yes | Yes\n",
    "| | Linear Regression with Lasso or Ridge Regularization | Yes | Yes |\n",
    "| | ElasticNet Regression | Yes | Yes |\n",
    "| | LARS Regression | No | No |\n",
    "| | Logistic Regression | Yes | Yes |\n",
    "| | Naive Bayes | No | No |\n",
    "| | Solvers | | Yes | Yes |\n",
    "| **Nonlinear Models for Regression or Classification** | Random Forest (RF) Classification | No | Partial |\n",
    "| | Random Forest (RF) Regression | No | Partial |\n",
    "| | Inference for decision tree-based models | No | No |\n",
    "|  | Nearest Neighbors (NN) | Yes | Yes |\n",
    "|  | K-Nearest Neighbors (KNN) Classification | Yes | Yes |\n",
    "|  | K-Nearest Neighbors (KNN) Regression | Yes | Yes |\n",
    "|  | Support Vector Machine Classifier (SVC) | No | No |\n",
    "|  | Epsilon-Support Vector Regression (SVR) | No | No |\n",
    "| **Time Series** | Holt-Winters Exponential Smoothing | No | No |\n",
    "|  | Auto-regressive Integrated Moving Average (ARIMA) | No | No |\n",
    "\n",
    "This allows the same code to be guaranteed to run in both GPU and CPU systems. Version 23.12 is scheduled to add the following algorithms:\n",
    "- Random Forest\n",
    "- Support Vector Machine estimators\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "## Installation\n",
    "\n",
    "For GPU systems, cuML still follows the [RAPIDS requirements](https://rapids.ai/#quick-start). The cuML package and wheels are universal and can run in both GPU and CPU modes. To use cuML in CPU-only systems, you can install using conda/mamba with:\n",
    "\n",
    "```bash\n",
    "mamba install -c rapidsai -c nvidia -c conda-forge cuml-cpu=23.10 \n",
    "# mamba install -c rapidsai-nightly -c nvidia -c conda-forge cuml-cpu=23.12 # for nightly builds\n",
    "```\n",
    "\n",
    "- cuML 23.10 supports Linux and WSL2 on GPU and CPU systems using conda. \n",
    "- cuML 23.12 will bring support for pip wheels and MacOS support for CPU execution. \n",
    "\n",
    "### How to Use\n",
    "\n",
    "There are two main ways to use the CPU capabilities of cuML:\n",
    "\n",
    "#### 1. Using CPU Package directly\n",
    "\n",
    "The CPU package, `cuml-cpu` is a subset of the `cuml` package, so there are zero code changes required to run the code when using a CPU-only system. For example, the following script can be run both in a system with GPU and `cuml`, as well as a system without GPU and `cuml-cpu`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import cuml # no change is needed for even the importing!\n",
    "import pandas as pd\n",
    "\n",
    "from cuml.manifold.umap import UMAP\n",
    "from sklearn import datasets\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.manifold import trustworthiness\n",
    "\n",
    "# load the iris dataset from sklearn and extract the required information\n",
    "iris = datasets.load_iris()\n",
    "dataset = iris.data\n",
    "\n",
    "iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)\n",
    "\n",
    "# define the cuml UMAP model and use fit_transform function to obtain the low dimensional output of the input dataset\n",
    "embedding = UMAP(\n",
    "    n_neighbors=10, min_dist=0.01,  init=\"random\"\n",
    ").fit_transform(iris_df)\n",
    "\n",
    "# calculate the trust worthiness of the results obtaind from the cuml UMAP\n",
    "trust = trustworthiness(iris_df, embedding)\n",
    "print(trust)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This allows easy prototyping on CPU systems and running production code on GPU servers, or the other way around. Some estimators support training on one type of system and then exporting models to the other type, as noted above and explained by example in [the corresponding section](#Cross-Device-Training-and-Inference-Serialization)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2. Managing Execution Platform with GPU package\n",
    "\n",
    "In addition to allowing the zero-code change execution in CPU systems, users can also manually control which device executes parts of the code when using a system with the full cuML.\n",
    "\n",
    "For example, using the following data: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import cuml\n",
    "from cuml.neighbors import NearestNeighbors\n",
    "from cuml.datasets import make_regression, make_blobs\n",
    "from cuml.model_selection import train_test_split\n",
    "\n",
    "X_blobs, y_blobs = make_blobs(n_samples=2000, \n",
    "                              n_features=20)\n",
    "X_train_blobs, X_test_blobs, y_train_blobs, y_test_blobs = train_test_split(X_blobs, \n",
    "                                                                            y_blobs, \n",
    "                                                                            test_size=0.2, shuffle=True)\n",
    "\n",
    "X_reg, y_reg = make_regression(n_samples=2000, \n",
    "                               n_features=20)\n",
    "X_train_reg, X_test_reg, y_train_reg, y_tes_reg = train_test_split(X_reg, \n",
    "                                                                   y_reg, \n",
    "                                                                   test_size=0.2, \n",
    "                                                                   shuffle=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are two ways to control the execution of the code:\n",
    "\n",
    "#### a) `using_device_type` context manager"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from cuml.neighbors import NearestNeighbors\n",
    "from cuml.common.device_selection import using_device_type\n",
    "\n",
    "nn = NearestNeighbors()\n",
    "with using_device_type('cpu'):\n",
    "    nn.fit(X_train_blobs)\n",
    "    nearest_neighbors = nn.kneighbors(X_test_blobs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This makes it easy to prototype and run different estimators on different devices, for example in the case where data is small so that moving the data around wouldn't allow the GPU to accelerate an estimator.  \n",
    "\n",
    "It also allows running estimators using unsupported parameters: "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "from cuml.manifold import UMAP\n",
    "\n",
    "umap_model = UMAP(angular_rp_forest=True) # `angular_rp_forest` hyperparameter only available in UMAP library\n",
    "with using_device_type('cpu'):\n",
    "    umap_model.fit(X_train_blobs) # will run the UMAP library with the hyperparameter\n",
    "with using_device_type('gpu'):\n",
    "    transformed = umap_model.transform(X_test_blobs) # will run the cuML implementation of UMAP, ignoring the unsupported parameter. \n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An upcoming feature will allow for this dispatch to occur automatically under-the-hood. This can be very useful for when integrating cuML into other libraries, so that if users use parameters not supported on GPUs, the code automatically will dispatch to a CPU implementation. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### b) Global configuration with `set_global_device_type`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default, `cuml` will execute estimators on the GPU/device. But it also allows a global configuration option to change the default device, which could be useful in shared systems where cuML is running alongside deep learning frameworks that are occupying most of a GPU. This can be accomplished with the  `set_global_device_type` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from cuml.common.device_selection import set_global_device_type, get_global_device_type\n",
    "\n",
    "initial_device_type = get_global_device_type()\n",
    "print('default execution device:', initial_device_type)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "set_global_device_type('cpu')\n",
    "print('new device type:', get_global_device_type())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Cross Device Training and Inference Serialization\n",
    "\n",
    "As stated above, a subset of the estimators support training on one type of device (CPU or GPU), serializing the trained model, and then deserializing and executing it on the other type of device. \n",
    "\n",
    "To do this, a simple API is provided. For example, To train a model on GPU but deploy it on CPU, first, train the estimator on device and save it to disk:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pickle\n",
    "from cuml.linear_model import LinearRegression\n",
    "\n",
    "lin_reg = LinearRegression()\n",
    "lin_reg.fit(X_train_reg, y_train_reg)\n",
    "\n",
    "pickle.dump(lin_reg, open(\"lin_reg.pkl\", \"wb\"))\n",
    "del lin_reg"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, on the server/other device, recover the estimator on a node with `cuml-cpu` installed: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "recovered_lin_reg = pickle.load(open(\"lin_reg.pkl\", \"rb\"))\n",
    "predictions = recovered_lin_reg.predict(X_test_reg)\n",
    "print(predictions[0:10])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "cuML's CPU capabilities are designed to facilitate different use cases, lower the barriers to using the capabilities of cuML, an streamline integrating cuML into other tools and deploying models. \n",
    "\n",
    "Upcoming versions of cuML will expand the supported estimators, both for CPU execution as well as serializing/exporting models between systems with and without GPUs. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.10.12 ('cuml_dev')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  },
  "vscode": {
   "interpreter": {
    "hash": "975233ed6ddd7eb5f50db124c7eb6e9abd7f2428099fbb1c703209662350014b"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}