{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 10 minutes to cuxfilter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is a short introduction to the cuxfilter.py library, mostly going over the basic usage and features provided as a quick summary."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What is cuxfilter?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "cuxfilter is inspired from the Crossfilter library, which is a fast, browser-based filtering mechanism across multiple dimensions and offers features do groupby operations on top of the dimensions. One of the major limitations of using Crossfilter is that it keeps data in-memory on a client-side browser, making it inefficient for processing large datasets.\n",
    "\n",
    "cuxfilter solves the issues by leveraging the power of the rapids.ai stack, mainly cudf. The data is maintained in a gpu as a GPU DataFrame and operations like groupby aggregations, sorting and querying are done on the gpu itself, only returning the result as the output to the charts.\n",
    "\n",
    "cuxfilter acts as a `connector` library, which provides the connections between different visualization libraries and a GPU dataframe without much hassle. This also allows the user to use charts from different libraries in a single dashboard, while also providing the interaction."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The modules"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> cuxfilter has following usable modules"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. cuxfilter.DataFrame\n",
    "2. cuxfilter.DashBoard\n",
    "3. cuxfilter.charts\n",
    "4. cuxfilter.layouts\n",
    "5. cuxfilter.themes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Usage"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1. Import the required modules"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import cuxfilter\n",
    "from cuxfilter import DataFrame, themes, layouts"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Download required datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "#update data_dir if you have downloaded datasets elsewhere\n",
    "DATA_DIR = './data/'\n",
    "\n",
    "! curl https://data.rapids.ai/viz-data/auto_accidents.arrow.gz --create-dirs -o $DATA_DIR/auto_accidents.arrow.gz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset - ./data//auto_accidents.arrow\n",
      "\n",
      "dataset already downloaded\n"
     ]
    }
   ],
   "source": [
    "from cuxfilter.sampledata import datasets_check\n",
    "datasets_check('auto_accidents', base_dir=DATA_DIR)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2. Read some data\n",
    "\n",
    "> cuxfilter can read arrow files off disk, or an inmemory cudf dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>dropoff_x</th>\n",
       "      <th>dropoff_y</th>\n",
       "      <th>DAY_WEEK</th>\n",
       "      <th>DAY_WEEK_STR</th>\n",
       "      <th>YEAR</th>\n",
       "      <th>MONTH</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>__index_level_0__</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-9.685585e+06</td>\n",
       "      <td>3.939943e+06</td>\n",
       "      <td>1</td>\n",
       "      <td>Sunday</td>\n",
       "      <td>2017</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-9.661068e+06</td>\n",
       "      <td>4.117979e+06</td>\n",
       "      <td>3</td>\n",
       "      <td>Tuesday</td>\n",
       "      <td>2017</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-9.589649e+06</td>\n",
       "      <td>3.811519e+06</td>\n",
       "      <td>3</td>\n",
       "      <td>Tuesday</td>\n",
       "      <td>2017</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-9.589649e+06</td>\n",
       "      <td>3.811519e+06</td>\n",
       "      <td>3</td>\n",
       "      <td>Tuesday</td>\n",
       "      <td>2017</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-9.589649e+06</td>\n",
       "      <td>3.811519e+06</td>\n",
       "      <td>3</td>\n",
       "      <td>Tuesday</td>\n",
       "      <td>2017</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                      dropoff_x     dropoff_y  DAY_WEEK DAY_WEEK_STR  YEAR  \\\n",
       "__index_level_0__                                                            \n",
       "0                 -9.685585e+06  3.939943e+06         1       Sunday  2017   \n",
       "1                 -9.661068e+06  4.117979e+06         3      Tuesday  2017   \n",
       "2                 -9.589649e+06  3.811519e+06         3      Tuesday  2017   \n",
       "3                 -9.589649e+06  3.811519e+06         3      Tuesday  2017   \n",
       "4                 -9.589649e+06  3.811519e+06         3      Tuesday  2017   \n",
       "\n",
       "                   MONTH  \n",
       "__index_level_0__         \n",
       "0                      2  \n",
       "1                      2  \n",
       "2                      1  \n",
       "3                      1  \n",
       "4                      1  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#create cuxfilter DataFrame\n",
    "cux_df = DataFrame.from_arrow(DATA_DIR + './auto_accidents.arrow')\n",
    "cux_df.data['ST_CASE'] = cux_df.data['ST_CASE'].astype('float64')\n",
    "\n",
    "# add a day_week_str column, using cudf.Series.map()\n",
    "cux_df.data['DAY_WEEK_STR'] = cux_df.data.DAY_WEEK.map({1: 'Sunday',    2: 'Monday',    3: 'Tuesday',    4: 'Wednesday',   5: 'Thursday',    6: 'Friday',    7: 'Saturday',    9: 'Unknown'})\n",
    "\n",
    "cux_df.data[['dropoff_x', 'dropoff_y', 'DAY_WEEK', 'DAY_WEEK_STR', 'YEAR', 'MONTH']].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3. Create some charts\n",
    "\n",
    "> see charts section to see available chart options"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "demo_red_blue_palette = [ \"#3182bd\", \"#6baed6\", \"#7b8ed8\", \"#e26798\", \"#ff0068\" , \"#323232\" ]\n",
    "\n",
    "chart1 = cuxfilter.charts.scatter(x='dropoff_x', y='dropoff_y', aggregate_col='DAY_WEEK', aggregate_fn='mean',\n",
    "                                color_palette=demo_red_blue_palette, tile_provider='CartoLight',\n",
    "                                pixel_shade_type='linear')\n",
    "chart2 = cuxfilter.charts.bar('YEAR')\n",
    "\n",
    "\n",
    "chart3 = cuxfilter.charts.multi_select('DAY_WEEK_STR')\n",
    "\n",
    "chart4 = cuxfilter.charts.number(expression=\"AGE\", aggregate_fn=\"mean\", title=\"Mean age\")\n",
    "chart5 = cuxfilter.charts.number(expression=\"SIDE_DRIV_STARS + FRNT_DRIV_STARS\", aggregate_fn=\"mean\", title=\"Vehicle(Mean front+side safety rating)\")\n",
    "charts_list = [chart1, chart2 ]\n",
    "sidebar = [chart3, chart4, chart5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4. Create a dashboard object "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "d = cux_df.dashboard(charts_list, sidebar=sidebar, title='Auto Accidents Dashboard', layout=layouts.feature_and_base, theme=themes.default)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "#### 5. Run the dashboard"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "1. d.show('current_notebook_url:current_notebook_port', sidebar_width=300, height=1000) remote dashboard\n",
    "\n",
    "2. d.app() inline within the notebook cell\n",
    "\n",
    "Incase you need to stop the server:\n",
    "\n",
    "- d.stop()\n",
    "\"\"\"\n",
    "\n",
    "# uncomment the line below to run the dashboard inline within the notebook cell\n",
    "# d.app(width=1000, height=800)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![auto-accidents-notebook](../../_images/auto-accidents-notebook.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# uncomment the line below to run the dashboard remotely and get the link to access it\n",
    "# d.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![dashboard-show](../../_images/dashboard-show.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Will display a dashboard like the following, in a new browser tab\n",
    "\n",
    "![auto-accidents-2](../../_images/auto-accidents-2.png)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  },
  "vscode": {
   "interpreter": {
    "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}