{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c5fdf490-fa77-4e56-92d1-53101fff75ba",
   "metadata": {},
   "source": [
    "# cuSpatial Python User's Guide\n",
    "\n",
    "cuSpatial is a GPU-accelerated Python library for spatial data analysis including distance and  \n",
    "trajectory computations, spatial data indexing and spatial join operations. cuSpatial's  \n",
    "Python API provides an accessible interface to high-performance spatial algorithms accelerated\n",
    "by CUDA-enabled GPUs."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "caadf3ca-be3c-4523-877c-4c35dd25093a",
   "metadata": {},
   "source": [
    "## Contents\n",
    "\n",
    "This guide provides a working example for all of the python API components of cuSpatial.  \n",
    "The following list links to each subsection.\n",
    "\n",
    "* [Installing cuSpatial](#Installing-cuSpatial)\n",
    "* [GPU accelerated memory layout](#GPU-accelerated-memory-layout)\n",
    "* [Input / Output](#Input-/-Output)\n",
    "* [Geopandas and cuDF integration](#Geopandas-and-cuDF-integration)\n",
    "* [Trajectories](#Trajectories)\n",
    "* [Bounding](#Bounding)\n",
    "* [Projection](#Projection)\n",
    "* [Distance](#Distance)\n",
    "* [Filtering](#Filtering)\n",
    "* [Spatial joins](#Spatial-joins)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "115c8382-f83f-476f-9a26-a64a45b3a8da",
   "metadata": {},
   "source": [
    "## Installing cuSpatial\n",
    "Read the [RAPIDS Quickstart Guide](  https://rapids.ai/start.html     ) to learn more about installing all RAPIDS libraries, including cuSpatial.\n",
    "\n",
    "If you are working on a system with a CUDA-enabled GPU and have CUDA installed, uncomment the  \n",
    "following cell and install cuSpatial:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7265f9d2-9203-4da2-bbb2-b35c7f933641",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !conda create -n rapids-24.10 -c rapidsai -c conda-forge -c nvidia \\ \n",
    "#     cuspatial=24.10 python=3.9 cudatoolkit=11.5 "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "051b6e68-9ffd-473a-89e2-313fe1c59d18",
   "metadata": {},
   "source": [
    "For other options to create a RAPIDS environment, such as docker or build from source, see  \n",
    "[RAPIDS Release Selector](  https://rapids.ai/start.html#get-rapids). \n",
    "\n",
    "If you wish to contribute to cuSpatial, you should create a source build using the excellent [rapids-compose](https://github.com/trxcllnt/rapids-compose)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7b770cb4-793e-467a-a306-2d3409545748",
   "metadata": {},
   "source": [
    "## GPU accelerated memory layout\n",
    "\n",
    "cuSpatial uses `GeoArrow` buffers, a GPU-friendly data format for geometric data that is well  \n",
    "suited for massively parallel programming. See [I/O]((#Input-/-Output) on the fastest methods to get your  \n",
    "data into cuSpatial. GeoArrow extends [PyArrow](\n",
    "https://arrow.apache.org/docs/python/index.html       ) bindings and introduces several new types suited  \n",
    "for geometry applications.  GeoArrow supports [ListArrays](\n",
    "https://arrow.apache.org/docs/python/data.html#arrays) for `Points`, `MultiPoints`,  \n",
    "`LineStrings`, `MultiLineStrings`, `Polygons`, and `MultiPolygons`. Using an Arrow [DenseArray](\n",
    "https://arrow.apache.org/docs/python/data.html#union-arrays),  \n",
    "GeoArrow stores heterogeneous types of Features. DataFrames of geometry objects and their  \n",
    "metadata can be loaded and transformed in a method similar to those in [GeoPandas.GeoSeries](\n",
    "https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "88d05bb9-c924-4d0b-8736-cd5183602d76",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Imports used throughout this notebook.\n",
    "import cuspatial\n",
    "import cudf\n",
    "import cupy\n",
    "import geopandas\n",
    "import os\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from shapely.geometry import *\n",
    "from shapely import wkt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "4937d4aa-4e32-49ab-a22e-96dfb0098d07",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# For deterministic result\n",
    "np.random.seed(0)\n",
    "cupy.random.seed(0)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "4b1251d1-558a-4899-8e7a-8066db0ad091",
   "metadata": {},
   "source": [
    "## Input / Output\n",
    "\n",
    "The primary method of loading features into cuSpatial is using [cuspatial.from_geopandas](\n",
    "https://docs.rapids.ai/api/cuspatial/stable/api_docs/io.html?highlight=from_geopandas#cuspatial.from_geopandas).\n",
    "\n",
    "One can also create feature geometries directly using any Python buffer that supports  \n",
    "`__array_interface__` for coordinates and their feature offsets."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "11b973bd-87e1-4b67-ab8c-23c3b8291335",
   "metadata": {},
   "source": [
    "### [cuspatial.from_geopandas](https://docs.rapids.ai/api/cuspatial/stable/api_docs/io.html?highlight=from_geopandas#cuspatial.from_geopandas)\n",
    "\n",
    "The easiest way to get data into cuSpatial is via `cuspatial.from_geopandas`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "255fbfbe-8be1-498c-9a26-f4a3f31bdded",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "        featurecla  scalerank  LABELRANK                   SOVEREIGNT SOV_A3  \\\n",
      "0  Admin-0 country          1          6                         Fiji    FJI   \n",
      "1  Admin-0 country          1          3  United Republic of Tanzania    TZA   \n",
      "2  Admin-0 country          1          7               Western Sahara    SAH   \n",
      "3  Admin-0 country          1          2                       Canada    CAN   \n",
      "4  Admin-0 country          1          2     United States of America    US1   \n",
      "\n",
      "   ADM0_DIF  LEVEL               TYPE TLC                        ADMIN  ...  \\\n",
      "0         0      2  Sovereign country   1                         Fiji  ...   \n",
      "1         0      2  Sovereign country   1  United Republic of Tanzania  ...   \n",
      "2         0      2      Indeterminate   1               Western Sahara  ...   \n",
      "3         0      2  Sovereign country   1                       Canada  ...   \n",
      "4         1      2            Country   1     United States of America  ...   \n",
      "\n",
      "      FCLASS_TR     FCLASS_ID     FCLASS_PL FCLASS_GR  FCLASS_IT  \\\n",
      "0          None          None          None      None       None   \n",
      "1          None          None          None      None       None   \n",
      "2  Unrecognized  Unrecognized  Unrecognized      None       None   \n",
      "3          None          None          None      None       None   \n",
      "4          None          None          None      None       None   \n",
      "\n",
      "      FCLASS_NL FCLASS_SE  FCLASS_BD FCLASS_UA  \\\n",
      "0          None      None       None      None   \n",
      "1          None      None       None      None   \n",
      "2  Unrecognized      None       None      None   \n",
      "3          None      None       None      None   \n",
      "4          None      None       None      None   \n",
      "\n",
      "                                            geometry  \n",
      "0  MULTIPOLYGON (((180 -16.06713, 180 -16.55522, ...  \n",
      "1  POLYGON ((33.90371 -0.95, 34.07262 -1.05982, 3...  \n",
      "2  POLYGON ((-8.66559 27.65643, -8.66512 27.58948...  \n",
      "3  MULTIPOLYGON (((-122.84 49, -122.97421 49.0025...  \n",
      "4  MULTIPOLYGON (((-122.84 49, -120 49, -117.0312...  \n",
      "\n",
      "[5 rows x 169 columns]\n",
      "(GPU)\n",
      "\n"
     ]
    }
   ],
   "source": [
    "host_dataframe = geopandas.read_file('https://naturalearth.s3.amazonaws.com/110m_cultural/ne_110m_admin_0_countries.zip')\n",
    "host_dataframe = host_dataframe.set_crs(4326)\n",
    "gpu_dataframe = cuspatial.from_geopandas(host_dataframe)\n",
    "print(gpu_dataframe.head())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "da5c775b-7458-4e3c-a573-e1bd060e3365",
   "metadata": {},
   "source": [
    "## Geopandas and cuDF integration\n",
    "\n",
    "A cuSpatial [GeoDataFrame](\n",
    "https://docs.rapids.ai/api/cuspatial/stable/api_docs/geopandas_compatibility.html#cuspatial.GeoDataFrame               ) is a collection of [cudf](\n",
    "https://docs.rapids.ai/api/cudf/stable/              ) [Series](\n",
    "https://docs.rapids.ai/api/cudf/stable/api_docs/series.html    ) and\n",
    "[cuspatial.GeoSeries](\n",
    "https://docs.rapids.ai/api/cuspatial/stable/api_docs/geopandas_compatibility.html#cuspatial.GeoSeries             ) `\"geometry\"` objects.  \n",
    "Both types of series are stored on the GPU, and\n",
    "`GeoSeries` is represented internally using `GeoArrow` data layout.\n",
    "\n",
    "One of the most important features of cuSpatial is that it is highly integrated with `cuDF`.  \n",
    "You can use any `cuDF` operation on cuSpatial non-feature columns, and most operations will work  \n",
    "with a `geometry` column. Operations that reduce or collate the number of rows in your DataFrame,  \n",
    "for example `groupby`, are not supported at this time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "956451e2-a520-441d-a939-575ed179917b",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "          featurecla  scalerank  LABELRANK   SOVEREIGNT SOV_A3  ADM0_DIF  \\\n",
      "103  Admin-0 country          1          3  Afghanistan    AFG         0   \n",
      "125  Admin-0 country          1          6      Albania    ALB         0   \n",
      "82   Admin-0 country          1          3      Algeria    DZA         0   \n",
      "74   Admin-0 country          1          3       Angola    AGO         0   \n",
      "159  Admin-0 country          1          4   Antarctica    ATA         0   \n",
      "\n",
      "     LEVEL               TYPE TLC        ADMIN  ... FCLASS_TR  FCLASS_ID  \\\n",
      "103      2  Sovereign country   1  Afghanistan  ...      None       None   \n",
      "125      2  Sovereign country   1      Albania  ...      None       None   \n",
      "82       2  Sovereign country   1      Algeria  ...      None       None   \n",
      "74       2  Sovereign country   1       Angola  ...      None       None   \n",
      "159      2      Indeterminate   1   Antarctica  ...      None       None   \n",
      "\n",
      "    FCLASS_PL FCLASS_GR  FCLASS_IT FCLASS_NL FCLASS_SE  FCLASS_BD FCLASS_UA  \\\n",
      "103      None      None       None      None      None       None      None   \n",
      "125      None      None       None      None      None       None      None   \n",
      "82       None      None       None      None      None       None      None   \n",
      "74       None      None       None      None      None       None      None   \n",
      "159      None      None       None      None      None       None      None   \n",
      "\n",
      "                                              geometry  \n",
      "103  POLYGON ((66.51861 37.36278, 67.07578 37.35614...  \n",
      "125  POLYGON ((21.02004 40.84273, 20.99999 40.58, 2...  \n",
      "82   POLYGON ((-8.6844 27.39574, -8.66512 27.58948,...  \n",
      "74   MULTIPOLYGON (((12.99552 -4.7811, 12.63161 -4....  \n",
      "159  MULTIPOLYGON (((-48.66062 -78.04702, -48.1514 ...  \n",
      "\n",
      "[5 rows x 169 columns]\n",
      "(GPU)\n",
      "\n"
     ]
    }
   ],
   "source": [
    "gpu_dataframe = cuspatial.from_geopandas(host_dataframe)\n",
    "continents_dataframe = gpu_dataframe.sort_values(\"NAME\")\n",
    "print(continents_dataframe.head())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "5453f308-317f-4775-ba3c-ff0633755bc4",
   "metadata": {},
   "source": [
    "You can also convert between GPU-backed `cuspatial.GeoDataFrame` and CPU-backed  \n",
    "`geopandas.GeoDataFrame` with `from_geopandas`  and `to_geopandas`, enabling you to  \n",
    "take advantage of any native GeoPandas operation. Note, however, that GeoPandas runs on  \n",
    " the CPU and therefore will not have as high performance as cuSpatial operations. The following  \n",
    "example displays the `Polygon` associated with the first item in the dataframe sorted  \n",
    "alphabetically by name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "cd8d1c39-b44f-4d06-9e11-04c2dca4cf15",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"100.0\" height=\"100.0\" viewBox=\"59.94324588403841 28.733388576771137 15.799965820375682 10.338076985718448\" preserveAspectRatio=\"xMinYMin meet\"><g transform=\"matrix(1,0,0,-1,0,67.80485413926073)\"><path fill-rule=\"evenodd\" fill=\"#66cc99\" stroke=\"#555555\" stroke-width=\"0.31599931640751366\" opacity=\"0.6\" d=\"M 66.51860680528867,37.36278432875879 L 67.07578209825962,37.35614390720929 L 67.82999962755952,37.144994004864685 L 68.13556237170138,37.02311513930431 L 68.85944583524594,37.344335842430596 L 69.19627282092438,37.15114350030743 L 69.51878543485796,37.60899669041342 L 70.11657840361033,37.58822276463209 L 70.27057417184014,37.735164699854025 L 70.3763041523093,38.13839590102752 L 70.80682050973289,38.486281643216415 L 71.34813113799026,38.25890534113216 L 71.23940392444817,37.953265082341886 L 71.54191775908478,37.905774441065645 L 71.44869347523024,37.06564484308052 L 71.8446382994506,36.73817129164692 L 72.1930408059624,36.948287665345674 L 72.63688968291729,37.047558091778356 L 73.26005577992501,37.495256862939 L 73.9486959166465,37.4215662704908 L 74.98000247589542,37.419990139305895 L 75.15802778514092,37.13303091078912 L 74.57589277537298,37.02084137628346 L 74.06755171091783,36.83617564548845 L 72.92002485544447,36.72000702569632 L 71.84629194528392,36.50994232842986 L 71.26234826038575,36.074387518857804 L 71.49876793812109,35.650563259416 L 71.61307620635071,35.153203436822864 L 71.11501875192164,34.733125718722235 L 71.15677330921346,34.34891144463215 L 70.8818030129884,33.98885590263852 L 69.9305432473596,34.02012014417511 L 70.3235941913716,33.35853261975839 L 69.68714725126486,33.105498969041236 L 69.26252200712256,32.5019440780883 L 69.31776411324256,31.901412258424443 L 68.92667687365767,31.620189113892067 L 68.55693200060932,31.713310044882018 L 67.79268924344478,31.58293040620963 L 67.68339358914747,31.30315420178142 L 66.93889122911847,31.304911200479353 L 66.38145755398602,30.738899237586452 L 66.34647260932442,29.887943427036177 L 65.0468620136161,29.472180691031905 L 64.35041873561852,29.560030625928093 L 64.14800215033125,29.340819200145972 L 63.55026085801117,29.468330796826166 L 62.54985680527278,29.31857249604431 L 60.87424848820879,29.829238999952608 L 61.781221551363444,30.735850328081238 L 61.69931440618083,31.379506130492672 L 60.94194461451113,31.548074652628753 L 60.863654819588966,32.18291962333443 L 60.536077915290775,32.98126882581157 L 60.963700392506006,33.52883230237626 L 60.52842980331158,33.676446031218006 L 60.80319339380745,34.40410187431986 L 61.210817091725744,35.650072333309225 L 62.230651483005886,35.270663967422294 L 62.98466230657661,35.40404083916762 L 63.19353844590035,35.857165635718914 L 63.98289594915871,36.0079574651466 L 64.5464791197339,36.31207326918427 L 64.7461051776774,37.111817735333304 L 65.58894778835784,37.30521678318564 L 65.74563073106683,37.66116404881207 L 66.21738488145934,37.39379018813392 L 66.51860680528867,37.36278432875879 z\" /></g></svg>"
      ],
      "text/plain": [
       "<POLYGON ((66.519 37.363, 67.076 37.356, 67.83 37.145, 68.136 37.023, 68.859...>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gpu_dataframe = cuspatial.from_geopandas(host_dataframe)\n",
    "sorted_dataframe = gpu_dataframe.sort_values(\"NAME\")\n",
    "sorted_dataframe = sorted_dataframe.to_geopandas()\n",
    "sorted_dataframe['geometry'].iloc[0]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "56b80722-38b6-457c-a7f0-591af6efd3ff",
   "metadata": {},
   "source": [
    "## Trajectories\n",
    "\n",
    "A trajectory is a `LineString` coupled with a time sample for each point in the `LineString`.  \n",
    "Use  `cuspatial.trajectory.derive_trajectories` to group trajectory datasets and sort by time.\n",
    "\n",
    "<img src=\"../_static/trajectory.png\" width=250>\n",
    "\n",
    "### [cuspatial.derive_trajectories](https://docs.rapids.ai/api/cuspatial/stable/api_docs/trajectory.html#cuspatial.derive_trajectories)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "cb5acdad-53aa-418f-9948-8445515bd2b2",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   object_id         x         y               timestamp\n",
      "0          1  0.680146  0.874341 1970-01-01 00:00:00.125\n",
      "1          1  0.843522  0.044402 1970-01-01 00:00:00.834\n",
      "2          1  0.837039  0.351025 1970-01-01 00:00:01.335\n",
      "3          1  0.946184  0.479038 1970-01-01 00:00:01.791\n",
      "4          1  0.117322  0.182117 1970-01-01 00:00:02.474\n",
      "0           0\n",
      "1        2455\n",
      "2        4899\n",
      "3        7422\n",
      "4        9924\n",
      "        ...  \n",
      "394    987408\n",
      "395    989891\n",
      "396    992428\n",
      "397    994975\n",
      "398    997448\n",
      "Length: 399, dtype: int32\n"
     ]
    }
   ],
   "source": [
    "# 1m random trajectory samples\n",
    "ids = cupy.random.randint(1, 400, 1000000)\n",
    "timestamps = cupy.random.random(1000000)*1000000\n",
    "xy= cupy.random.random(2000000)\n",
    "trajs = cuspatial.GeoSeries.from_points_xy(xy)\n",
    "sorted_trajectories, trajectory_offsets = \\\n",
    "    cuspatial.core.trajectory.derive_trajectories(ids, trajs, timestamps)\n",
    "# sorted_trajectories is a DataFrame containing all trajectory samples\n",
    "# sorted first by `object_id` and then by `timestamp`.\n",
    "print(sorted_trajectories.head())\n",
    "# trajectory_offsets is a Series containing the start position of each\n",
    "# trajectory in sorted_trajectories.\n",
    "print(trajectory_offsets)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "3c4a90f9-8661-4fda-9026-473e6ce87bd2",
   "metadata": {},
   "source": [
    "`derive_trajectories` sorts the trajectories by `object_id`, then `timestamp`, and returns a  \n",
    "tuple containing the sorted trajectory data frame in the first index position and the offsets  \n",
    "buffer defining the start and stop of each trajectory in the second index position.  \n",
    "\n",
    "### [cuspatial.trajectory_distances_and_speeds](https://docs.rapids.ai/api/cuspatial/stable/api_docs/trajectory.html#cuspatial.trajectory_distances_and_speeds)\n",
    "\n",
    "Use `trajectory_distance_and_speed` to calculate the overall distance travelled in meters and  \n",
    "the speed of a set of trajectories with the same format as the result returned by `derive_trajectories`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "03b75847-090d-40f8-8147-cb10b900d6ec",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                   distance        speed\n",
      "trajectory_id                           \n",
      "0              1.278996e+06  1280.320089\n",
      "1              1.267179e+06  1268.370390\n",
      "2              1.294437e+06  1295.905261\n",
      "3              1.323413e+06  1323.956714\n",
      "4              1.309590e+06  1311.561012\n"
     ]
    }
   ],
   "source": [
    "trajs = cuspatial.GeoSeries.from_points_xy(\n",
    "    sorted_trajectories[[\"x\", \"y\"]].interleave_columns()\n",
    ")\n",
    "d_and_s = cuspatial.core.trajectory.trajectory_distances_and_speeds(\n",
    "  len(cudf.Series(ids).unique()),\n",
    "  sorted_trajectories['object_id'],\n",
    "  trajs,\n",
    "  sorted_trajectories['timestamp']\n",
    ")\n",
    "print(d_and_s.head())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "eeb0da7c-0bdf-49ec-8244-4165acc96074",
   "metadata": {},
   "source": [
    "Finally, compute the bounding boxes of trajectories that follow the format of the above two  \n",
    "examples:"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "12a3ce7e-82b7-4b51-8227-5e157a48701c",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Bounding\n",
    "\n",
    "Compute the bounding boxes of `n` polygons or linestrings:\n",
    "\n",
    "<img src=\"../_static/bounding.png\" width=250>\n",
    "\n",
    "### [cuspatial.trajectory_bounding_boxes](https://docs.rapids.ai/api/cuspatial/stable/api_docs/trajectory.html#cuspatial.trajectory_bounding_boxes)\n",
    "\n",
    "`trajectory_bounding_boxes` works out of the box with the values returned by `derive_trajectories`.  \n",
    "Its arguments are the number of incoming objects, the offsets of those objects, and x and y point buffers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "452f60cb-28cc-4ad8-8aa2-9d73e3d56ec6",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "      x_min     y_min     x_max     y_max\n",
      "0  0.000361  0.000170  0.999582  0.999485\n",
      "1  0.000184  0.000647  0.999939  0.999884\n",
      "2  0.000461  0.001395  0.999938  0.999297\n",
      "3  0.000093  0.000073  0.999819  0.999544\n",
      "4  0.000105  0.000112  0.999952  0.999013\n"
     ]
    }
   ],
   "source": [
    "bounding_boxes = cuspatial.core.trajectory.trajectory_bounding_boxes(\n",
    "  len(cudf.Series(ids, dtype=\"int32\").unique()),\n",
    "  sorted_trajectories['object_id'],\n",
    "  trajs\n",
    ")\n",
    "print(bounding_boxes.head())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "a56dfe17-1739-4b20-85c9-fcb5902c1585",
   "metadata": {},
   "source": [
    "### [cuspatial.polygon_bounding_boxes](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.polygon_bounding_boxes)\n",
    "\n",
    "`polygon_bounding_boxes` supports more complex geometry objects such as `Polygon`s with multiple  \n",
    "rings. The combination of `part_offset` and `ring_offset` allows the function to use only the  \n",
    "exterior ring for computing the bounding box."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "9266aac4-f925-4fb7-b287-5f0b795d5756",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "        minx       miny       maxx       maxy\n",
      "0  29.339998 -11.720938  40.316590  -0.950000\n",
      "1 -17.063423  20.999752  -8.665124  27.656426\n",
      "2  46.466446  40.662325  87.359970  55.385250\n",
      "3  55.928917  37.144994  73.055417  45.586804\n",
      "4  12.182337 -13.257227  31.174149   5.256088\n"
     ]
    }
   ],
   "source": [
    "single_polygons = cuspatial.from_geopandas(\n",
    "    host_dataframe['geometry'][host_dataframe['geometry'].type == \"Polygon\"]\n",
    ")\n",
    "bounding_box_polygons = cuspatial.core.spatial.bounding.polygon_bounding_boxes(\n",
    "    single_polygons\n",
    ")\n",
    "print(bounding_box_polygons.head())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "85197478-801c-4d2d-8b10-c1136d7bb15c",
   "metadata": {},
   "source": [
    "### [cuspatial.linestring_bounding_boxes](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.linestring_bounding_boxes)\n",
    "\n",
    "Equivalently, we can treat trajectories as Linestrings and compute the same bounding boxes from  \n",
    "the above trajectory calculation more generally:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "15b5bb38-702f-4360-b48c-2e49ffd650d7",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       minx      miny      maxx      maxy\n",
      "0  0.000261  0.000070  0.999682  0.999585\n",
      "1  0.000084  0.000547  1.000039  0.999984\n",
      "2  0.000361  0.001295  1.000038  0.999397\n",
      "3 -0.000007 -0.000027  0.999919  0.999644\n",
      "4  0.000005  0.000012  1.000052  0.999113\n"
     ]
    }
   ],
   "source": [
    "lines = cuspatial.GeoSeries.from_linestrings_xy(\n",
    "    trajs.points.xy, trajectory_offsets, cupy.arange(len(trajectory_offsets))\n",
    ")\n",
    "trajectory_bounding_boxes = cuspatial.core.spatial.bounding.linestring_bounding_boxes(\n",
    "    lines, 0.0001\n",
    ")\n",
    "print(trajectory_bounding_boxes.head())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "81c4d3ca-5d3f-4ae1-ae8e-ac1e252f3e17",
   "metadata": {},
   "source": [
    "## Projection\n",
    "\n",
    "cuSpatial provides a simple sinusoidal longitude / latitude to Cartesian coordinate transform.  \n",
    "This function requires an origin point to determine the scaling parameters for the lonlat inputs.  \n",
    "\n",
    "### [cuspatial.sinusoidal_projection](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.sinusoidal_projection)\n",
    "\n",
    "The following cell converts the lonlat coordinates of the country of Afghanistan to Cartesian  \n",
    "coordinates in km, centered around the center of the country, suitable for graphing and display."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "a7a870dd-c0ae-41c1-a66c-cff4bd2db0ec",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0     POINT (112.174 -281.59)\n",
      "1     POINT (62.152 -280.852)\n",
      "2     POINT (-5.573 -257.391)\n",
      "3    POINT (-33.071 -243.849)\n",
      "4     POINT (-98.002 -279.54)\n",
      "dtype: geometry\n"
     ]
    }
   ],
   "source": [
    "gpu_dataframe = cuspatial.from_geopandas(host_dataframe)\n",
    "afghanistan = gpu_dataframe['geometry'][gpu_dataframe['NAME'] == 'Afghanistan']\n",
    "points = cuspatial.GeoSeries.from_points_xy(afghanistan.polygons.xy)\n",
    "projected = cuspatial.sinusoidal_projection(\n",
    "    afghanistan.polygons.x.mean(),\n",
    "    afghanistan.polygons.y.mean(),\n",
    "    points\n",
    ")\n",
    "print(projected.head())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c9085e80-e7ba-48e9-8c50-12e544e3af46",
   "metadata": {},
   "source": [
    "## Distance\n",
    "cuSpatial provides a growing suite of distance computation functions. Parallel distance functions  \n",
    "come in two main forms: pairwise, which computes a distance for each corresponding pair of input  \n",
    "geometries; and all-pairs, which computes a distance for the each element of the Cartesian product  \n",
    "of input geometries (for each input geometry in A, compute the distance from A to each input\n",
    "geometry in B).\"\n",
    "    \n",
    "Two pairwise distance functions are included in cuSpatial: `haversine` and `pairwise_linestring`.  \n",
    "The `hausdorff` clustering distances algorithm is also available, computing the hausdorff  \n",
    "distance across the cartesian product of its single input.\n",
    "\n",
    "### [cuspatial.directed_hausdorff_distance](https://docs.rapids.ai/api/cuspatial/stable/api_docs/trajectory.html#cuspatial.directed_hausdorff_distance)\n",
    "\n",
    "The directed Hausdorff distance from one space to another is the greatest of all the distances  \n",
    "between any point in the first space to the closet point in the second. This is especially useful  \n",
    "as a similarity metric between trajectories.\n",
    "\n",
    "<img src=\"../_static/hausdorff.png\" width=250>\n",
    "\n",
    "[Hausdorff distance](https://en.wikipedia.org/wiki/Hausdorff_distance)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "e75b0352-0f80-404d-a113-f301601cd5a3",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "        0         1         2         3         4         5         6    \\\n",
      "0  0.000000  0.034755  0.031989  0.031959  0.031873  0.038674  0.029961   \n",
      "1  0.030328  0.000000  0.038672  0.032086  0.031049  0.032170  0.032275   \n",
      "2  0.027640  0.030539  0.000000  0.036737  0.033055  0.043447  0.028812   \n",
      "3  0.031497  0.033380  0.035224  0.000000  0.032581  0.035484  0.030339   \n",
      "4  0.031079  0.032256  0.035731  0.039084  0.000000  0.036416  0.031369   \n",
      "\n",
      "        7         8         9    ...       388       389       390       391  \\\n",
      "0  0.029117  0.040962  0.033259  ...  0.031614  0.036447  0.035548  0.028233   \n",
      "1  0.030215  0.034443  0.032998  ...  0.030594  0.035665  0.031473  0.031916   \n",
      "2  0.031807  0.039269  0.033250  ...  0.031998  0.033636  0.034646  0.032615   \n",
      "3  0.034792  0.045755  0.031810  ...  0.033623  0.031359  0.034923  0.032287   \n",
      "4  0.030388  0.033751  0.034029  ...  0.030705  0.040339  0.034328  0.029027   \n",
      "\n",
      "        392       393       394       395       396       397  \n",
      "0  0.034176  0.030057  0.033863  0.031111  0.034590  0.033850  \n",
      "1  0.037483  0.033489  0.041403  0.029784  0.035374  0.038179  \n",
      "2  0.036681  0.030642  0.038432  0.032481  0.034810  0.036695  \n",
      "3  0.032808  0.029771  0.040891  0.030802  0.032279  0.038443  \n",
      "4  0.035645  0.027703  0.037529  0.029356  0.031260  0.035501  \n",
      "\n",
      "[5 rows x 398 columns]\n"
     ]
    }
   ],
   "source": [
    "coordinates = sorted_trajectories[['x', 'y']].interleave_columns()\n",
    "spaces = cuspatial.GeoSeries.from_multipoints_xy(\n",
    "    coordinates, trajectory_offsets\n",
    ")\n",
    "hausdorff_distances = cuspatial.core.spatial.distance.directed_hausdorff_distance(\n",
    "    spaces\n",
    ")\n",
    "print(hausdorff_distances.head())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "993d7566-203e-4b87-adad-088e2fd92eed",
   "metadata": {},
   "source": [
    "### [cuspatial.haversine_distance](https://docs.rapids.ai/api/cuspatial/stable/api_docs/gis.html#cuspatial.haversine_distance)\n",
    "\n",
    "Haversine distance is the great circle distance between longitude and latitude pairs. cuSpatial  \n",
    "uses the `lon/lat` ordering to better reflect the cartesian coordinates of great circle  \n",
    "coordinates: `x/y`.\n",
    "\n",
    "<img src=\"../_static/haversine.png\" width=250>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "70f66319-c4d2-4a93-ab98-0debcce4a719",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    9959.695143\n",
       "1    9803.166859\n",
       "2    9876.857085\n",
       "3    9925.097106\n",
       "4    9927.268486\n",
       "Name: None, dtype: float64"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gpu_dataframe = cuspatial.from_geopandas(host_dataframe)\n",
    "polygons_first = gpu_dataframe['geometry'][0:10]\n",
    "polygons_second = gpu_dataframe['geometry'][10:30]\n",
    "\n",
    "points_first = polygons_first.polygons.xy[0:1000]\n",
    "points_second = polygons_second.polygons.xy[0:1000]\n",
    "\n",
    "first = cuspatial.GeoSeries.from_points_xy(points_first)\n",
    "second = cuspatial.GeoSeries.from_points_xy(points_second)\n",
    "\n",
    "# The number of coordinates in two sets of polygons vary, so\n",
    "# we'll just compare the first set of 1000 values here.\n",
    "distances_in_meters = cuspatial.haversine_distance(\n",
    "    first, second\n",
    ")\n",
    "cudf.Series(distances_in_meters).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f21c1709-dfc1-411b-b9d5-ec5dc345367f",
   "metadata": {},
   "source": [
    "This above method reads the GeoPandas data from CPU memory into GPU memory and then cuSpatial processes it. If the data is already in a cuDF GPU dataframe, you can quickly calculate Haversine distances using the method below. This maximizes speed by keeping all the processing on the GPU and is very useful when working on large datasets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "c32741b0-c550-4d0f-b494-45191f5b60fd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   latitude  longitude  atlanta_lat  atlanta_lng  atlanta_dist\n",
      "0   17.1167   -61.7833       33.749      -84.388  11961.556540\n",
      "1   17.1333   -61.7833       33.749      -84.388  11963.392729\n",
      "2   25.3330    55.5170       33.749      -84.388  12243.126130\n",
      "3   25.2550    55.3640       33.749      -84.388  12233.867463\n",
      "4   24.4330    54.6510       33.749      -84.388  12139.822218\n"
     ]
    }
   ],
   "source": [
    "# Generate data to be used to create a cuDF dataframe.  \n",
    "# The data to be processed by Haversine MUST be a Float.\n",
    "a = {\"latitude\":[17.1167, 17.1333, 25.333, 25.255, 24.433, 24.262, 35.317, 34.21, 34.566, 31.5, 36.7167, 30.5667, 28.05, 22.8, 35.7297, 36.97, 36.78, 36.8, 36.8, 36.72],\n",
    "     \"longitude\": [-61.7833, -61.7833, 55.517, 55.364, 54.651, 55.609, 69.017, 62.228, 69.212, 65.85, 3.25, 2.8667, 9.6331, 5.4331, 0.65, 7.79, 3.07, 3.03, 3.04, 4.05]}\n",
    "df = cudf.DataFrame(data=a)\n",
    "\n",
    "# Create cuSpatial GeoSeries from cuDF Dataframe\n",
    "cuGeoSeries = cuspatial.GeoSeries.from_points_xy(df[['longitude', 'latitude']].interleave_columns())\n",
    "\n",
    "# Create Comparator cuSpatial GeoSeries from a comparator point\n",
    "df['atlanta_lat'] = 33.7490\n",
    "df['atlanta_lng'] = -84.3880\n",
    "atlGeoSeries = cuspatial.GeoSeries.from_points_xy(df[['atlanta_lat', 'atlanta_lng']].interleave_columns())\n",
    "\n",
    "# Calculate Haversine Distance of cuDF dataframe to comparator point\n",
    "df['atlanta_dist'] = cuspatial.haversine_distance(cuGeoSeries, atlGeoSeries)\n",
    "print(df.head())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7f2239c5-58d0-4912-9bd7-246cc6741c0a",
   "metadata": {},
   "source": [
    "### Pairwise distance\n",
    "\n",
    "`pairwise_linestring_distance` computes the distance between a `GeoSeries` of Linestrings of  \n",
    "length `n` and a corresponding `GeoSeries` of Linestrings of `n` length. It returns the  \n",
    "minimum distance from any point in the first linestring of the pair to the nearest segment  \n",
    "or point within the second Linestring of the pair.\n",
    "\n",
    "The input accepts a pair of geoseries as input sequences of linestring arrays.\n",
    "\n",
    "The below example uses the polygons from `naturalearth_lowres` and treats them as linestrings.  \n",
    "The first example computes the distances between all polygons and themselves, while the second  \n",
    "example computes the distance between the first 50 polygons and the second 50 polygons.\n",
    "\n",
    "### [cuspatial.pairwise_linestring_distance](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.pairwise_linestring_distance)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "35dfb7c9-1914-488a-b22e-8d0067ea7a8b",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0    0.0\n",
      "1    0.0\n",
      "2    0.0\n",
      "3    0.0\n",
      "4    0.0\n",
      "dtype: float64\n",
      "0    152.200610\n",
      "1     44.076445\n",
      "2      2.417269\n",
      "3     44.197151\n",
      "4     75.821029\n",
      "dtype: float64\n"
     ]
    }
   ],
   "source": [
    "gpu_boundaries = cuspatial.from_geopandas(host_dataframe.geometry.boundary)\n",
    "zeros = cuspatial.pairwise_linestring_distance(\n",
    "    gpu_boundaries[0:50],\n",
    "    gpu_boundaries[0:50]\n",
    ")\n",
    "print(zeros.head())\n",
    "lines1 = gpu_boundaries[0:50]\n",
    "lines2 = gpu_boundaries[50:100]\n",
    "distances = cuspatial.pairwise_linestring_distance(\n",
    "    lines1, lines2\n",
    ")\n",
    "print(distances.head())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "de6b73ac-1b48-422c-8463-37367ad73507",
   "metadata": {},
   "source": [
    "`pairwise_point_linestring_distance` computes the distance between pairs of points and  \n",
    "linestrings. It can be used with polygons treated as linestrings as well. In the following  \n",
    "example the minimum distance from a country's center to it's border is computed.\n",
    "\n",
    "### [cuspatial.pairwise_point_linestring_distance](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.pairwise_point_linestring_distance)\n",
    "\n",
    "Using WGS 84 Pseudo-Mercator, distances are in meters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "40e9a41e-21af-47cc-a142-b19a67941f7f",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "         featurecla  scalerank  LABELRANK                        SOVEREIGNT  \\\n",
      "1   Admin-0 country          1          3       United Republic of Tanzania   \n",
      "2   Admin-0 country          1          7                    Western Sahara   \n",
      "5   Admin-0 country          1          3                        Kazakhstan   \n",
      "6   Admin-0 country          1          3                        Uzbekistan   \n",
      "11  Admin-0 country          1          2  Democratic Republic of the Congo   \n",
      "\n",
      "   SOV_A3  ADM0_DIF  LEVEL               TYPE TLC  \\\n",
      "1     TZA         0      2  Sovereign country   1   \n",
      "2     SAH         0      2      Indeterminate   1   \n",
      "5     KA1         1      1        Sovereignty   1   \n",
      "6     UZB         0      2  Sovereign country   1   \n",
      "11    COD         0      2  Sovereign country   1   \n",
      "\n",
      "                               ADMIN  ...     FCLASS_ID     FCLASS_PL  \\\n",
      "1        United Republic of Tanzania  ...          None          None   \n",
      "2                     Western Sahara  ...  Unrecognized  Unrecognized   \n",
      "5                         Kazakhstan  ...          None          None   \n",
      "6                         Uzbekistan  ...          None          None   \n",
      "11  Democratic Republic of the Congo  ...          None          None   \n",
      "\n",
      "   FCLASS_GR FCLASS_IT     FCLASS_NL FCLASS_SE FCLASS_BD  FCLASS_UA  \\\n",
      "1       None      None          None      None      None       None   \n",
      "2       None      None  Unrecognized      None      None       None   \n",
      "5       None      None          None      None      None       None   \n",
      "6       None      None          None      None      None       None   \n",
      "11      None      None          None      None      None       None   \n",
      "\n",
      "                                             geometry border_distance  \n",
      "1   POLYGON ((3774143.866 -105758.362, 3792946.708...     8047.288391  \n",
      "2   POLYGON ((-964649.018 3205725.605, -964597.245...   593137.492497  \n",
      "5   POLYGON ((9724867.413 6311418.173, 9640131.701...    37091.213890  \n",
      "6   POLYGON ((6230350.563 5057973.384, 6225978.591...   278633.467299  \n",
      "11  POLYGON ((3266113.592 -501451.658, 3286149.877...    35812.988244  \n",
      "\n",
      "[5 rows x 170 columns]\n",
      "(GPU)\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Convert input dataframe to Pseudo-Mercator projection.\n",
    "host_dataframe3857 = host_dataframe.to_crs(3857)\n",
    "polygons = host_dataframe3857[host_dataframe3857['geometry'].type == \"Polygon\"]\n",
    "gpu_polygons = cuspatial.from_geopandas(polygons)\n",
    "# Extract mean_x and mean_y from each country\n",
    "mean_x = [gpu_polygons['geometry'].iloc[[ix]].polygons.x.mean() for ix in range(len(gpu_polygons))]\n",
    "mean_y = [gpu_polygons['geometry'].iloc[[ix]].polygons.y.mean() for ix in range(len(gpu_polygons))]\n",
    "# Convert mean_x/mean_y values into Points for use in API.\n",
    "points = cuspatial.GeoSeries([Point(point) for point in zip(mean_x, mean_y)])\n",
    "# Convert Polygons into Linestrings for use in API.\n",
    "linestring_df = cuspatial.from_geopandas(geopandas.geoseries.GeoSeries(\n",
    "    [MultiLineString(mapping(polygons['geometry'].iloc[ix])[\"coordinates\"]) for ix in range(len(polygons))]\n",
    "))\n",
    "gpu_polygons['border_distance'] = cuspatial.pairwise_point_linestring_distance(\n",
    "    points, linestring_df\n",
    ")\n",
    "print(gpu_polygons.head())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "008d320d-ca47-459f-9fff-8769494c8a61",
   "metadata": {},
   "source": [
    "### cuspatial.pairwise_point_polygon_distance\n",
    "\n",
    "Using WGS 84 Pseudo-Mercator, distances are in meters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "258c9a8c-7fe3-4047-80b7-00878d9fb2f1",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "        featurecla  scalerank  LABELRANK                   SOVEREIGNT SOV_A3  \\\n",
      "0  Admin-0 country          1          6                         Fiji    FJI   \n",
      "1  Admin-0 country          1          3  United Republic of Tanzania    TZA   \n",
      "2  Admin-0 country          1          7               Western Sahara    SAH   \n",
      "3  Admin-0 country          1          2                       Canada    CAN   \n",
      "4  Admin-0 country          1          2     United States of America    US1   \n",
      "\n",
      "   ADM0_DIF  LEVEL               TYPE TLC                        ADMIN  ...  \\\n",
      "0         0      2  Sovereign country   1                         Fiji  ...   \n",
      "1         0      2  Sovereign country   1  United Republic of Tanzania  ...   \n",
      "2         0      2      Indeterminate   1               Western Sahara  ...   \n",
      "3         0      2  Sovereign country   1                       Canada  ...   \n",
      "4         1      2            Country   1     United States of America  ...   \n",
      "\n",
      "      FCLASS_PL  FCLASS_GR FCLASS_IT     FCLASS_NL  FCLASS_SE FCLASS_BD  \\\n",
      "0          None       None      None          None       None      None   \n",
      "1          None       None      None          None       None      None   \n",
      "2  Unrecognized       None      None  Unrecognized       None      None   \n",
      "3          None       None      None          None       None      None   \n",
      "4          None       None      None          None       None      None   \n",
      "\n",
      "  FCLASS_UA                                           geometry distance_from  \\\n",
      "0      None  MULTIPOLYGON (((180 -16.06713, 180 -16.55522, ...  Vatican City   \n",
      "1      None  POLYGON ((33.90371 -0.95, 34.07262 -1.05982, 3...    San Marino   \n",
      "2      None  POLYGON ((-8.66559 27.65643, -8.66512 27.58948...         Vaduz   \n",
      "3      None  MULTIPOLYGON (((-122.84 49, -122.97421 49.0025...       Lobamba   \n",
      "4      None  MULTIPOLYGON (((-122.84 49, -120 49, -117.0312...    Luxembourg   \n",
      "\n",
      "       distance  \n",
      "0  5.329915e+06  \n",
      "1  5.628613e+06  \n",
      "2  6.057264e+06  \n",
      "3  4.626961e+06  \n",
      "4  6.415631e+06  \n",
      "\n",
      "[5 rows x 171 columns]\n",
      "(GPU)\n",
      "\n"
     ]
    }
   ],
   "source": [
    "countries = host_dataframe\n",
    "\n",
    "cities = geopandas.read_file('https://naturalearth.s3.amazonaws.com/110m_cultural/ne_110m_populated_places_simple.zip')\n",
    "cities = cities.to_crs(3857)\n",
    "\n",
    "gpu_cities = cuspatial.from_geopandas(cities)\n",
    "gpu_countries = cuspatial.from_geopandas(countries)\n",
    "dist = cuspatial.pairwise_point_polygon_distance(\n",
    "    gpu_cities.geometry[:len(gpu_countries)], gpu_countries.geometry\n",
    ")\n",
    "\n",
    "gpu_countries[\"distance_from\"] = cities.name\n",
    "gpu_countries[\"distance\"] = dist\n",
    "\n",
    "print(gpu_countries.head())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "e0b4d618",
   "metadata": {},
   "source": [
    "### cuspatial.pairwise_linestring_polygon_distance\n",
    "\n",
    "Using WGS 84 Pseudo-Mercator, distances are in meters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "5863871e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>geometry</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>index</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Columbus Avenue</td>\n",
       "      <td>LINESTRING (-8234860.077 4980333.535, -8234863...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>West 80th Street</td>\n",
       "      <td>LINESTRING (-8235173.854 4980508.442, -8235160...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Amsterdam Avenue</td>\n",
       "      <td>LINESTRING (-8235173.854 4980508.442, -8235168...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>West 80th Street</td>\n",
       "      <td>LINESTRING (-8235369.475 4980617.398, -8235347...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Broadway</td>\n",
       "      <td>LINESTRING (-8235369.475 4980617.398, -8235373...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   name                                           geometry\n",
       "index                                                                     \n",
       "0       Columbus Avenue  LINESTRING (-8234860.077 4980333.535, -8234863...\n",
       "1      West 80th Street  LINESTRING (-8235173.854 4980508.442, -8235160...\n",
       "2      Amsterdam Avenue  LINESTRING (-8235173.854 4980508.442, -8235168...\n",
       "3      West 80th Street  LINESTRING (-8235369.475 4980617.398, -8235347...\n",
       "4              Broadway  LINESTRING (-8235369.475 4980617.398, -8235373..."
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# all driveways within 2km range of central park, nyc\n",
    "\n",
    "if not os.path.exists(\"./streets_3857.csv\"):\n",
    "    import osmnx as ox\n",
    "    graph = ox.graph_from_point((40.769361, -73.977655), dist=2000, network_type=\"drive\")\n",
    "    nodes, streets = ox.graph_to_gdfs(graph)\n",
    "    streets = streets.to_crs(3857)\n",
    "    streets = streets.reset_index(drop=True)\n",
    "    streets.index.name = \"index\"\n",
    "    streets[[\"name\", \"geometry\"]].to_csv(\"streets_3857.csv\")\n",
    "\n",
    "# The data is under notebooks/streets_3857.csv\n",
    "streets = pd.read_csv(\"./streets_3857.csv\", index_col=\"index\")\n",
    "streets.geometry = streets.geometry.apply(wkt.loads)\n",
    "streets = geopandas.GeoDataFrame(streets)\n",
    "streets.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "f4c67464",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/raid/jlamb/miniforge/envs/cuspatial-dev/lib/python3.11/site-packages/osmnx/features.py:294: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.\n",
      "  polygon = gdf_place[\"geometry\"].unary_union\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>geometry</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>index</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>POLYGON ((-8236139.639 4975314.625, -8235990.3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>POLYGON ((-8236139.639 4975314.625, -8235990.3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>POLYGON ((-8236139.639 4975314.625, -8235990.3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>POLYGON ((-8236139.639 4975314.625, -8235990.3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>POLYGON ((-8236139.639 4975314.625, -8235990.3...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                geometry\n",
       "index                                                   \n",
       "0      POLYGON ((-8236139.639 4975314.625, -8235990.3...\n",
       "0      POLYGON ((-8236139.639 4975314.625, -8235990.3...\n",
       "0      POLYGON ((-8236139.639 4975314.625, -8235990.3...\n",
       "0      POLYGON ((-8236139.639 4975314.625, -8235990.3...\n",
       "0      POLYGON ((-8236139.639 4975314.625, -8235990.3..."
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The polygon of the Empire State Building\n",
    "\n",
    "if not os.path.exists(\"./esb_3857.csv\"):\n",
    "    import osmnx as ox\n",
    "    esb = ox.features.features_from_place('Empire State Building, New York', tags={\"building\": True})\n",
    "    esb = esb.to_crs(3857)\n",
    "    esb = esb.geometry.reset_index(drop=True)\n",
    "    esb.index.name = \"index\"\n",
    "    esb.to_csv(\"esb_3857.csv\")\n",
    "\n",
    "# The data is under notebooks/esb_3857.csv\n",
    "esb = pd.read_csv(\"./esb_3857.csv\", index_col=\"index\")\n",
    "esb.geometry = esb.geometry.apply(wkt.loads)\n",
    "esb = geopandas.GeoDataFrame(esb)\n",
    "esb = pd.concat([esb.iloc[0:1]] * len(streets))\n",
    "esb.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "d4e68e87",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>dist</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Columbus Avenue</td>\n",
       "      <td>4993.583717</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>West 80th Street</td>\n",
       "      <td>5103.472213</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Amsterdam Avenue</td>\n",
       "      <td>5208.373183</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>West 80th Street</td>\n",
       "      <td>5275.851781</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Broadway</td>\n",
       "      <td>5178.999774</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1862</th>\n",
       "      <td>West 82nd Street</td>\n",
       "      <td>5411.762092</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1863</th>\n",
       "      <td>Broadway</td>\n",
       "      <td>5476.345847</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1864</th>\n",
       "      <td>West 84th Street</td>\n",
       "      <td>5613.403002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1865</th>\n",
       "      <td>West 75th Street</td>\n",
       "      <td>4750.092380</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1866</th>\n",
       "      <td>Broadway</td>\n",
       "      <td>4638.894939</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1867 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                  name         dist\n",
       "0      Columbus Avenue  4993.583717\n",
       "1     West 80th Street  5103.472213\n",
       "2     Amsterdam Avenue  5208.373183\n",
       "3     West 80th Street  5275.851781\n",
       "4             Broadway  5178.999774\n",
       "...                ...          ...\n",
       "1862  West 82nd Street  5411.762092\n",
       "1863          Broadway  5476.345847\n",
       "1864  West 84th Street  5613.403002\n",
       "1865  West 75th Street  4750.092380\n",
       "1866          Broadway  4638.894939\n",
       "\n",
       "[1867 rows x 2 columns]"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Straight line distance between the driveways to the Empire State Building\n",
    "gpu_streets = cuspatial.from_geopandas(streets.geometry)\n",
    "gpu_esb = cuspatial.from_geopandas(esb.geometry)\n",
    "\n",
    "dist = cuspatial.pairwise_linestring_polygon_distance(gpu_streets, gpu_esb).rename(\"dist\")\n",
    "pd.concat([streets[\"name\"].reset_index(drop=True), dist.to_pandas()], axis=1)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "3e1bb692",
   "metadata": {},
   "source": [
    "### cuspatial.pairwise_polygon_distance\n",
    "\n",
    "Using WGS 84 Pseudo-Mercator, distances are in meters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "951625da",
   "metadata": {},
   "outputs": [],
   "source": [
    "african_countries = gpu_countries[gpu_countries.CONTINENT == \"Africa\"].sort_values(\"POP_EST\", ascending=False)\n",
    "asian_countries = gpu_countries[gpu_countries.CONTINENT == \"Asia\"].sort_values(\"POP_EST\", ascending=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "2f0e1118",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Africa</th>\n",
       "      <th>Asia</th>\n",
       "      <th>dist</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Nigeria</td>\n",
       "      <td>China</td>\n",
       "      <td>64.966620</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Ethiopia</td>\n",
       "      <td>India</td>\n",
       "      <td>25.598868</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Egypt</td>\n",
       "      <td>Indonesia</td>\n",
       "      <td>60.717434</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Dem. Rep. Congo</td>\n",
       "      <td>Pakistan</td>\n",
       "      <td>37.489668</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>South Africa</td>\n",
       "      <td>Bangladesh</td>\n",
       "      <td>72.860545</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Tanzania</td>\n",
       "      <td>Japan</td>\n",
       "      <td>97.872886</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Kenya</td>\n",
       "      <td>Philippines</td>\n",
       "      <td>75.450451</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Uganda</td>\n",
       "      <td>Vietnam</td>\n",
       "      <td>69.827567</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>Turkey</td>\n",
       "      <td>17.927419</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Sudan</td>\n",
       "      <td>Iran</td>\n",
       "      <td>13.990335</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Africa         Asia       dist\n",
       "0          Nigeria        China  64.966620\n",
       "1         Ethiopia        India  25.598868\n",
       "2            Egypt    Indonesia  60.717434\n",
       "3  Dem. Rep. Congo     Pakistan  37.489668\n",
       "4     South Africa   Bangladesh  72.860545\n",
       "5         Tanzania        Japan  97.872886\n",
       "6            Kenya  Philippines  75.450451\n",
       "7           Uganda      Vietnam  69.827567\n",
       "8          Algeria       Turkey  17.927419\n",
       "9            Sudan         Iran  13.990335"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Straight line distance between the top 10 most populated countries in Asia and Africa\n",
    "population_top10_africa = african_countries[:10].reset_index(drop=True)\n",
    "population_top10_asia = asian_countries[:10].reset_index(drop=True)\n",
    "dist = cuspatial.pairwise_polygon_distance(\n",
    "    population_top10_africa.geometry, population_top10_asia.geometry)\n",
    "\n",
    "cudf.concat([\n",
    "    population_top10_africa[\"NAME\"].rename(\"Africa\"),\n",
    "    population_top10_asia[\"NAME\"].rename(\"Asia\"), \n",
    "    dist.rename(\"dist\")], axis=1\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f5f27dc3-46ee-4a62-82de-20f76744382f",
   "metadata": {},
   "source": [
    "## Filtering\n",
    "\n",
    "The filtering module contains `points_in_spatial_window`, which returns from a set of points only those points that fall within a spatial window defined by four bounding coordinates: `min_x`, `max_x`, `min_y`, and `max_y`. The following example finds only the points of polygons that fall within 1 standard deviation of the mean of all of the polygons.\n",
    "\n",
    "<img src=\"../_static/spatial_window.png\" width=550>\n",
    "\n",
    "### [cuspatial.points_in_spatial_window](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.points_in_spatial_window)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "d1ade9da-c9e2-45c4-9685-dffeda3fd358",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0       POINT (33.90371 -0.95)\n",
      "1    POINT (34.07262 -1.05982)\n",
      "2    POINT (37.69869 -3.09699)\n",
      "3     POINT (37.7669 -3.67712)\n",
      "4    POINT (39.20222 -4.67677)\n",
      "dtype: geometry\n"
     ]
    }
   ],
   "source": [
    "gpu_dataframe = cuspatial.from_geopandas(host_dataframe)\n",
    "geometry = gpu_dataframe['geometry']\n",
    "points = cuspatial.GeoSeries.from_points_xy(geometry.polygons.xy)\n",
    "mean_x, std_x = (geometry.polygons.x.mean(), geometry.polygons.x.std())\n",
    "mean_y, std_y = (geometry.polygons.y.mean(), geometry.polygons.y.std())\n",
    "avg_points = cuspatial.points_in_spatial_window(\n",
    "    points,\n",
    "    mean_x - std_x,\n",
    "    mean_x + std_x,\n",
    "    mean_y - std_y,\n",
    "    mean_y + std_y\n",
    ")\n",
    "print(avg_points.head())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "5027a3dd-78bb-4d17-af94-506d0ed689c8",
   "metadata": {},
   "source": [
    "With some careful grouping, one can reconstruct the original complete polygons that fall within the range."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "3b33ce2b-965f-42a1-a89e-66d7ca80d907",
   "metadata": {},
   "source": [
    "## Set Operations"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d73548f3-c9bb-43ff-9788-858f3b7d08e4",
   "metadata": {},
   "source": [
    "### Linestring Intersections\n",
    "\n",
    "cuSpatial provides a linestring-linestring intersection algorithm to compute the overlapping geometries between two linestrings.\n",
    "The API also returns the ids for each returned geometry to help user to trace back the source geometry."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "cc72a44d-a9bf-4432-9898-de899ac45869",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from cuspatial.core.binops.intersection import pairwise_linestring_intersection\n",
    "\n",
    "usa_boundary = cuspatial.from_geopandas(host_dataframe[host_dataframe.NAME == \"United States of America\"].geometry.boundary)\n",
    "canada_boundary = cuspatial.from_geopandas(host_dataframe[host_dataframe.NAME == \"Canada\"].geometry.boundary)\n",
    "\n",
    "list_offsets, geometries, look_back_ids = pairwise_linestring_intersection(usa_boundary, canada_boundary)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "1125fd17-afe1-4b9c-b48d-8842dd3700b3",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<cudf.core.column.numerical.NumericalColumn object at 0x7f525c27a570>\n",
       "[\n",
       "  0,\n",
       "  142\n",
       "]\n",
       "dtype: int32"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The first integer series shows that the result contains 1 row (since we only have 1 pair of linestrings as input).\n",
    "# This row contains 144 geometires.\n",
    "list_offsets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "b281e3bb-42d4-4d60-9cb2-b7dcc20b4776",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0                            POINT (-130.53611 54.80275)\n",
       "1                            POINT (-130.53611 54.80278)\n",
       "2                            POINT (-130.53611 54.80275)\n",
       "3                                 POINT (-129.98 55.285)\n",
       "4                            POINT (-130.53611 54.80278)\n",
       "                             ...                        \n",
       "137                  LINESTRING (-120 49, -117.03121 49)\n",
       "138                     LINESTRING (-122.84 49, -120 49)\n",
       "139               LINESTRING (-117.03121 49, -107.05 49)\n",
       "140    LINESTRING (-83.89077 46.11693, -83.61613 46.1...\n",
       "141    LINESTRING (-82.69009 41.67511, -82.43928 41.6...\n",
       "Length: 142, dtype: geometry"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The second element is a geoseries that contains the intersecting geometries, with 144 rows, including points and linestrings.\n",
    "geometries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "e19873b9-2614-4242-ad67-caa47f807d04",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>lhs_linestring_id</th>\n",
       "      <th>lhs_segment_id</th>\n",
       "      <th>rhs_linestring_id</th>\n",
       "      <th>rhs_segment_id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, ...</td>\n",
       "      <td>[18, 16, 18, 15, 17, 137, 14, 16, 13, 15, 14, ...</td>\n",
       "      <td>[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...</td>\n",
       "      <td>[9, 10, 10, 11, 11, 28, 12, 12, 13, 13, 14, 15...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                   lhs_linestring_id  \\\n",
       "0  [8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, ...   \n",
       "\n",
       "                                      lhs_segment_id  \\\n",
       "0  [18, 16, 18, 15, 17, 137, 14, 16, 13, 15, 14, ...   \n",
       "\n",
       "                                   rhs_linestring_id  \\\n",
       "0  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...   \n",
       "\n",
       "                                      rhs_segment_id  \n",
       "0  [9, 10, 10, 11, 11, 28, 12, 12, 13, 13, 14, 15...  "
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The third element is a dataframe that contains IDs to the input segments and linestrings, 4 for each result row.\n",
    "# Each represents ids to lhs, rhs linestring and segment ids.\n",
    "look_back_ids"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "a17bd64a",
   "metadata": {},
   "source": [
    "## Spatial Joins\n",
    "\n",
    "cuSpatial provides a number of functions to facilitate high-performance spatial joins, \n",
    "including unindexed and quadtree-indexed point-in-polygon and quadtree-indexed point to nearest\n",
    "linestring.\n",
    "\n",
    "The API for spatial joins does not yet match GeoPandas, but with knowledge of cuSpatial data formats\n",
    "you can call  `cuspatial.point_in_polygon` for large numbers of points on 32 polygons or less, or\n",
    "call `cuspatial.quadtree_point_in_polygon` for large numbers of points and polygons.  \n",
    "\n",
    "### Unindexed Point-in-polygon Join\n",
    "\n",
    "### [cuspatial.point_in_polygon](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.point_in_polygon)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "bf7b2256",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1     11896\n",
       "2      1268\n",
       "5     50835\n",
       "6      7792\n",
       "11    29318\n",
       "dtype: int64"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "single_polygons = host_dataframe[host_dataframe['geometry'].type == \"Polygon\"]\n",
    "gpu_dataframe = cuspatial.from_geopandas(single_polygons)\n",
    "x_points = (cupy.random.random(10000000) - 0.5) * 360\n",
    "y_points = (cupy.random.random(10000000) - 0.5) * 180\n",
    "xy = cudf.DataFrame({\"x\": x_points, \"y\": y_points}).interleave_columns()\n",
    "points = cuspatial.GeoSeries.from_points_xy(xy)\n",
    "\n",
    "short_dataframe = gpu_dataframe.iloc[0:31]\n",
    "geometry = short_dataframe['geometry']\n",
    "\n",
    "points_in_polygon = cuspatial.point_in_polygon(\n",
    "    points, geometry\n",
    ")\n",
    "sum_of_points_in_polygons_0_to_31 = points_in_polygon.sum()\n",
    "sum_of_points_in_polygons_0_to_31.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "fd9c4eef",
   "metadata": {},
   "source": [
    "cuSpatial includes another join algorithm, `quadtree_point_in_polygon` that uses an indexing  \n",
    "quadtree for faster calculations. `quadtree_point_in_polygon` also supports a number of  \n",
    "polygons limited only by memory constraints."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "387565b3-75ae-4789-a950-daffc2d4da01",
   "metadata": {},
   "source": [
    "### Quadtree Indexing\n",
    "\n",
    "The indexing module is used to create a spatial quadtree. Use  \n",
    "```\n",
    "cuspatial.quadtree_on_points(\n",
    "    points,\n",
    "    x_min,\n",
    "    x_max,\n",
    "    y_min,\n",
    "    y_max,\n",
    "    scale,\n",
    "    max_depth,\n",
    "    max_size\n",
    ")\n",
    "```\n",
    "to create the quadtree object that is used by the `quadtree_point_in_polygon`  \n",
    "function in the `join` module.\n",
    "\n",
    "The function uses a set of points and a user-defined bounding box to build an  \n",
    "indexing quad tree. Be sure to adjust the parameters appropriately, with larger  \n",
    "parameter values for larger datasets.\n",
    "\n",
    "`scale`: A scaling function that increases the size of the point space from an  \n",
    "origin defined by `{x_min, y_min}`. This can increase the likelihood of generating  \n",
    "well-separated quads.\n",
    "\n",
    "`max_depth`: In order for a quadtree to index points effectively, it must have a\n",
    "depth that is log-scaled with the size of the number of points. Each level of the  \n",
    "quad tree contains 4 quads. The number of available quads $q$ for indexing is then  \n",
    "equal to $q = 4^{d}$ where $d$ is the `max_depth` parameter. With an input size  \n",
    "of `10m` points and `max_depth = 7`, $\\frac{10^6}{4^7}$ points will be most  \n",
    "efficiently packed into the leaves of the quad tree.\n",
    "\n",
    "`max_size`: The maximum number of points allowed in an internal node before it is\n",
    "split into four leaf notes. As the  quadtree is generated, a leaf node containing\n",
    "usable index points will be created as points are added. If the number of points\n",
    "in this leaf exceeds `max_size`, the leaf will be subdivided, with four new\n",
    "leaves added and the original node removed from the set of leaves. This number is\n",
    "probably optimized in most datasets by making it a significant fraction of the\n",
    "optimal leaf size computation from above. Consider $10,000,000 / 4^7 / 4 = 153$.\n",
    "\n",
    "### [cuspatial.quadtree_on_points](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.quadtree_on_points)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "e3a0a9a3-0bdd-4f05-bcb5-7db4b99a44a3",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0     1507\n",
      "1     1726\n",
      "2     4242\n",
      "3     7371\n",
      "4    11341\n",
      "dtype: uint32\n",
      "   key  level  is_internal_node  length  offset\n",
      "0    0      0              True       4       2\n",
      "1    1      0              True       2       6\n",
      "2    0      1              True       4       8\n",
      "3    1      1              True       4      12\n",
      "4    2      1              True       2      16\n"
     ]
    }
   ],
   "source": [
    "x_points = (cupy.random.random(10000000) - 0.5) * 360\n",
    "y_points = (cupy.random.random(10000000) - 0.5) * 180\n",
    "xy = cudf.DataFrame({\"x\": x_points, \"y\": y_points}).interleave_columns()\n",
    "points = cuspatial.GeoSeries.from_points_xy(xy)\n",
    "\n",
    "scale = 5\n",
    "max_depth = 7\n",
    "max_size = 125\n",
    "point_indices, quadtree = cuspatial.quadtree_on_points(points,\n",
    "                                                       x_points.min(),\n",
    "                                                       x_points.max(),\n",
    "                                                       y_points.min(),\n",
    "                                                       y_points.max(),\n",
    "                                                       scale,\n",
    "                                                       max_depth,\n",
    "                                                       max_size)\n",
    "print(point_indices.head())\n",
    "print(quadtree.head())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "0ab7e64d-1199-498c-b020-b3e7393337a5",
   "metadata": {},
   "source": [
    "### Indexed Spatial Joins\n",
    "\n",
    "The quadtree spatial index (`point_indices` and `quadtree`) is used by `quadtree_point_in_polygon`\n",
    "and `quadtree_point_to_nearest_linestring` to accelerate larger spatial joins. \n",
    "`quadtree_point_in_polygon` depends on a number of intermediate products calculated here using the\n",
    "following functions.\n",
    "\n",
    "### [cuspatial.join_quadtree_and_bounding_boxes](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.join_quadtree_and_bounding_boxes)\n",
    "### [cuspatial.quadtree_point_in_polygon](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.quadtree_point_in_polygon)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "023bd25a-35be-435d-ab0b-ecbd7a47e147",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Empty DataFrame\n",
      "Columns: [polygon_index, point_index]\n",
      "Index: []\n"
     ]
    }
   ],
   "source": [
    "polygons = gpu_dataframe['geometry']\n",
    "\n",
    "poly_bboxes = cuspatial.polygon_bounding_boxes(\n",
    "    polygons\n",
    ")\n",
    "intersections = cuspatial.join_quadtree_and_bounding_boxes(\n",
    "    quadtree,\n",
    "    poly_bboxes,\n",
    "    polygons.polygons.x.min(),\n",
    "    polygons.polygons.x.max(),\n",
    "    polygons.polygons.y.min(),\n",
    "    polygons.polygons.y.max(),\n",
    "    scale,\n",
    "    max_depth\n",
    ")\n",
    "polygons_and_points = cuspatial.quadtree_point_in_polygon(\n",
    "    intersections,\n",
    "    quadtree,\n",
    "    point_indices,\n",
    "    points,\n",
    "    polygons\n",
    ")\n",
    "print(polygons_and_points.head())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "aa1552a1-fe4b-4d30-b76f-054a060593ae",
   "metadata": {},
   "source": [
    "You can see above that polygon 270 maps to the first 5 points. In order to bring this back to  \n",
    "a specific row of the original dataframe, the individual polygons must be mapped back to their  \n",
    "original MultiPolygon row. This is left an an exercise.\n",
    "\n",
    "### [cuspatial.quadtree_point_to_nearest_linestring](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.quadtree_point_to_nearest_linestring)\n",
    "\n",
    "`cuspatial.quadtree_point_to_nearest_linestring` can be used to find the Polygon or Linestring  \n",
    "nearest to a set of points from another set of mixed geometries. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "784aff8e-c9ed-4a81-aa87-bf301b3b90af",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "gpu_countries = cuspatial.from_geopandas(countries[countries['geometry'].type == \"Polygon\"])\n",
    "gpu_cities = cuspatial.from_geopandas(cities[cities['geometry'].type == 'Point'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "fea24c78-cf5c-45c6-b860-338238e61323",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Empty DataFrame\n",
      "Columns: [point_index, linestring_index, distance]\n",
      "Index: []\n"
     ]
    }
   ],
   "source": [
    "polygons = gpu_countries['geometry'].polygons\n",
    "\n",
    "boundaries = cuspatial.GeoSeries.from_linestrings_xy(\n",
    "    cudf.DataFrame({\"x\": polygons.x, \"y\": polygons.y}).interleave_columns(),\n",
    "    polygons.ring_offset,\n",
    "    cupy.arange(len(polygons.ring_offset))\n",
    ")\n",
    "\n",
    "point_indices, quadtree = cuspatial.quadtree_on_points(gpu_cities['geometry'],\n",
    "                                                       polygons.x.min(),\n",
    "                                                       polygons.x.max(),\n",
    "                                                       polygons.y.min(),\n",
    "                                                       polygons.y.max(),\n",
    "                                                       scale,\n",
    "                                                       max_depth,\n",
    "                                                       max_size)\n",
    "poly_bboxes = cuspatial.linestring_bounding_boxes(\n",
    "    boundaries,\n",
    "    2.0\n",
    ")\n",
    "intersections = cuspatial.join_quadtree_and_bounding_boxes(\n",
    "    quadtree,\n",
    "    poly_bboxes,\n",
    "    polygons.x.min(),\n",
    "    polygons.x.max(),\n",
    "    polygons.y.min(),\n",
    "    polygons.y.max(),\n",
    "    scale,\n",
    "    max_depth\n",
    ")\n",
    "result = cuspatial.quadtree_point_to_nearest_linestring(\n",
    "    intersections,\n",
    "    quadtree,\n",
    "    point_indices,\n",
    "    gpu_cities['geometry'],\n",
    "    boundaries\n",
    ")\n",
    "print(result.head())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "3e4e07f6",
   "metadata": {},
   "source": [
    "_Images used with permission from Wikipedia Creative Commons_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c3c504d6-a908-455d-9d35-58521f70c5e2",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  },
  "vscode": {
   "interpreter": {
    "hash": "ef2a625a21f49284d4111fd61c77079c8ec37c2ac9f170a08eb051e93ed3e888"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}