{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "c5fdf490-fa77-4e56-92d1-53101fff75ba", "metadata": {}, "source": [ "# cuSpatial Python User's Guide\n", "\n", "cuSpatial is a GPU-accelerated Python library for spatial data analysis including distance and \n", "trajectory computations, spatial data indexing and spatial join operations. cuSpatial's \n", "Python API provides an accessible interface to high-performance spatial algorithms accelerated\n", "by CUDA-enabled GPUs." ] }, { "attachments": {}, "cell_type": "markdown", "id": "caadf3ca-be3c-4523-877c-4c35dd25093a", "metadata": {}, "source": [ "## Contents\n", "\n", "This guide provides a working example for all of the python API components of cuSpatial. \n", "The following list links to each subsection.\n", "\n", "* [Installing cuSpatial](#Installing-cuSpatial)\n", "* [GPU accelerated memory layout](#GPU-accelerated-memory-layout)\n", "* [Input / Output](#Input-/-Output)\n", "* [Geopandas and cuDF integration](#Geopandas-and-cuDF-integration)\n", "* [Trajectories](#Trajectories)\n", "* [Bounding](#Bounding)\n", "* [Projection](#Projection)\n", "* [Distance](#Distance)\n", "* [Filtering](#Filtering)\n", "* [Spatial joins](#Spatial-joins)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "115c8382-f83f-476f-9a26-a64a45b3a8da", "metadata": {}, "source": [ "## Installing cuSpatial\n", "Read the [RAPIDS Quickstart Guide]( https://rapids.ai/start.html ) to learn more about installing all RAPIDS libraries, including cuSpatial.\n", "\n", "If you are working on a system with a CUDA-enabled GPU and have CUDA installed, uncomment the \n", "following cell and install cuSpatial:" ] }, { "cell_type": "code", "execution_count": null, "id": "7265f9d2-9203-4da2-bbb2-b35c7f933641", "metadata": {}, "outputs": [], "source": [ "# !conda create -n rapids-24.08 -c rapidsai -c conda-forge -c nvidia \\ \n", "# cuspatial=24.08 python=3.9 cudatoolkit=11.5 " ] }, { "attachments": {}, "cell_type": "markdown", "id": "051b6e68-9ffd-473a-89e2-313fe1c59d18", "metadata": {}, "source": [ "For other options to create a RAPIDS environment, such as docker or build from source, see \n", "[RAPIDS Release Selector]( https://rapids.ai/start.html#get-rapids). \n", "\n", "If you wish to contribute to cuSpatial, you should create a source build using the excellent [rapids-compose](https://github.com/trxcllnt/rapids-compose)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7b770cb4-793e-467a-a306-2d3409545748", "metadata": {}, "source": [ "## GPU accelerated memory layout\n", "\n", "cuSpatial uses `GeoArrow` buffers, a GPU-friendly data format for geometric data that is well \n", "suited for massively parallel programming. See [I/O]((#Input-/-Output) on the fastest methods to get your \n", "data into cuSpatial. GeoArrow extends [PyArrow](\n", "https://arrow.apache.org/docs/python/index.html ) bindings and introduces several new types suited \n", "for geometry applications. GeoArrow supports [ListArrays](\n", "https://arrow.apache.org/docs/python/data.html#arrays) for `Points`, `MultiPoints`, \n", "`LineStrings`, `MultiLineStrings`, `Polygons`, and `MultiPolygons`. Using an Arrow [DenseArray](\n", "https://arrow.apache.org/docs/python/data.html#union-arrays), \n", "GeoArrow stores heterogeneous types of Features. DataFrames of geometry objects and their \n", "metadata can be loaded and transformed in a method similar to those in [GeoPandas.GeoSeries](\n", "https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.html)." ] }, { "cell_type": "code", "execution_count": 1, "id": "88d05bb9-c924-4d0b-8736-cd5183602d76", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Imports used throughout this notebook.\n", "import cuspatial\n", "import cudf\n", "import cupy\n", "import geopandas\n", "import os\n", "import pandas as pd\n", "import numpy as np\n", "from shapely.geometry import *\n", "from shapely import wkt" ] }, { "cell_type": "code", "execution_count": 2, "id": "4937d4aa-4e32-49ab-a22e-96dfb0098d07", "metadata": { "tags": [] }, "outputs": [], "source": [ "# For deterministic result\n", "np.random.seed(0)\n", "cupy.random.seed(0)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "4b1251d1-558a-4899-8e7a-8066db0ad091", "metadata": {}, "source": [ "## Input / Output\n", "\n", "The primary method of loading features into cuSpatial is using [cuspatial.from_geopandas](\n", "https://docs.rapids.ai/api/cuspatial/stable/api_docs/io.html?highlight=from_geopandas#cuspatial.from_geopandas).\n", "\n", "One can also create feature geometries directly using any Python buffer that supports \n", "`__array_interface__` for coordinates and their feature offsets." ] }, { "attachments": {}, "cell_type": "markdown", "id": "11b973bd-87e1-4b67-ab8c-23c3b8291335", "metadata": {}, "source": [ "### [cuspatial.from_geopandas](https://docs.rapids.ai/api/cuspatial/stable/api_docs/io.html?highlight=from_geopandas#cuspatial.from_geopandas)\n", "\n", "The easiest way to get data into cuSpatial is via `cuspatial.from_geopandas`." ] }, { "cell_type": "code", "execution_count": 3, "id": "255fbfbe-8be1-498c-9a26-f4a3f31bdded", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " featurecla scalerank LABELRANK SOVEREIGNT SOV_A3 \\\n", "0 Admin-0 country 1 6 Fiji FJI \n", "1 Admin-0 country 1 3 United Republic of Tanzania TZA \n", "2 Admin-0 country 1 7 Western Sahara SAH \n", "3 Admin-0 country 1 2 Canada CAN \n", "4 Admin-0 country 1 2 United States of America US1 \n", "\n", " ADM0_DIF LEVEL TYPE TLC ADMIN ... \\\n", "0 0 2 Sovereign country 1 Fiji ... \n", "1 0 2 Sovereign country 1 United Republic of Tanzania ... \n", "2 0 2 Indeterminate 1 Western Sahara ... \n", "3 0 2 Sovereign country 1 Canada ... \n", "4 1 2 Country 1 United States of America ... \n", "\n", " FCLASS_TR FCLASS_ID FCLASS_PL FCLASS_GR FCLASS_IT \\\n", "0 None None None None None \n", "1 None None None None None \n", "2 Unrecognized Unrecognized Unrecognized None None \n", "3 None None None None None \n", "4 None None None None None \n", "\n", " FCLASS_NL FCLASS_SE FCLASS_BD FCLASS_UA \\\n", "0 None None None None \n", "1 None None None None \n", "2 Unrecognized None None None \n", "3 None None None None \n", "4 None None None None \n", "\n", " geometry \n", "0 MULTIPOLYGON (((180 -16.06713, 180 -16.55522, ... \n", "1 POLYGON ((33.90371 -0.95, 34.07262 -1.05982, 3... \n", "2 POLYGON ((-8.66559 27.65643, -8.66512 27.58948... \n", "3 MULTIPOLYGON (((-122.84 49, -122.97421 49.0025... \n", "4 MULTIPOLYGON (((-122.84 49, -120 49, -117.0312... \n", "\n", "[5 rows x 169 columns]\n", "(GPU)\n", "\n" ] } ], "source": [ "host_dataframe = geopandas.read_file('https://naturalearth.s3.amazonaws.com/110m_cultural/ne_110m_admin_0_countries.zip')\n", "host_dataframe = host_dataframe.set_crs(4326)\n", "gpu_dataframe = cuspatial.from_geopandas(host_dataframe)\n", "print(gpu_dataframe.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "da5c775b-7458-4e3c-a573-e1bd060e3365", "metadata": {}, "source": [ "## Geopandas and cuDF integration\n", "\n", "A cuSpatial [GeoDataFrame](\n", "https://docs.rapids.ai/api/cuspatial/stable/api_docs/geopandas_compatibility.html#cuspatial.GeoDataFrame ) is a collection of [cudf](\n", "https://docs.rapids.ai/api/cudf/stable/ ) [Series](\n", "https://docs.rapids.ai/api/cudf/stable/api_docs/series.html ) and\n", "[cuspatial.GeoSeries](\n", "https://docs.rapids.ai/api/cuspatial/stable/api_docs/geopandas_compatibility.html#cuspatial.GeoSeries ) `\"geometry\"` objects. \n", "Both types of series are stored on the GPU, and\n", "`GeoSeries` is represented internally using `GeoArrow` data layout.\n", "\n", "One of the most important features of cuSpatial is that it is highly integrated with `cuDF`. \n", "You can use any `cuDF` operation on cuSpatial non-feature columns, and most operations will work \n", "with a `geometry` column. Operations that reduce or collate the number of rows in your DataFrame, \n", "for example `groupby`, are not supported at this time." ] }, { "cell_type": "code", "execution_count": 4, "id": "956451e2-a520-441d-a939-575ed179917b", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " featurecla scalerank LABELRANK SOVEREIGNT SOV_A3 ADM0_DIF \\\n", "103 Admin-0 country 1 3 Afghanistan AFG 0 \n", "125 Admin-0 country 1 6 Albania ALB 0 \n", "82 Admin-0 country 1 3 Algeria DZA 0 \n", "74 Admin-0 country 1 3 Angola AGO 0 \n", "159 Admin-0 country 1 4 Antarctica ATA 0 \n", "\n", " LEVEL TYPE TLC ADMIN ... FCLASS_TR FCLASS_ID \\\n", "103 2 Sovereign country 1 Afghanistan ... None None \n", "125 2 Sovereign country 1 Albania ... None None \n", "82 2 Sovereign country 1 Algeria ... None None \n", "74 2 Sovereign country 1 Angola ... None None \n", "159 2 Indeterminate 1 Antarctica ... None None \n", "\n", " FCLASS_PL FCLASS_GR FCLASS_IT FCLASS_NL FCLASS_SE FCLASS_BD FCLASS_UA \\\n", "103 None None None None None None None \n", "125 None None None None None None None \n", "82 None None None None None None None \n", "74 None None None None None None None \n", "159 None None None None None None None \n", "\n", " geometry \n", "103 POLYGON ((66.51861 37.36278, 67.07578 37.35614... \n", "125 POLYGON ((21.02004 40.84273, 20.99999 40.58, 2... \n", "82 POLYGON ((-8.6844 27.39574, -8.66512 27.58948,... \n", "74 MULTIPOLYGON (((12.99552 -4.7811, 12.63161 -4.... \n", "159 MULTIPOLYGON (((-48.66062 -78.04702, -48.1514 ... \n", "\n", "[5 rows x 169 columns]\n", "(GPU)\n", "\n" ] } ], "source": [ "gpu_dataframe = cuspatial.from_geopandas(host_dataframe)\n", "continents_dataframe = gpu_dataframe.sort_values(\"NAME\")\n", "print(continents_dataframe.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "5453f308-317f-4775-ba3c-ff0633755bc4", "metadata": {}, "source": [ "You can also convert between GPU-backed `cuspatial.GeoDataFrame` and CPU-backed \n", "`geopandas.GeoDataFrame` with `from_geopandas` and `to_geopandas`, enabling you to \n", "take advantage of any native GeoPandas operation. Note, however, that GeoPandas runs on \n", " the CPU and therefore will not have as high performance as cuSpatial operations. The following \n", "example displays the `Polygon` associated with the first item in the dataframe sorted \n", "alphabetically by name." ] }, { "cell_type": "code", "execution_count": 5, "id": "cd8d1c39-b44f-4d06-9e11-04c2dca4cf15", "metadata": { "tags": [] }, "outputs": [ { "data": { "image/svg+xml": [ "" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gpu_dataframe = cuspatial.from_geopandas(host_dataframe)\n", "sorted_dataframe = gpu_dataframe.sort_values(\"NAME\")\n", "sorted_dataframe = sorted_dataframe.to_geopandas()\n", "sorted_dataframe['geometry'].iloc[0]" ] }, { "attachments": {}, "cell_type": "markdown", "id": "56b80722-38b6-457c-a7f0-591af6efd3ff", "metadata": {}, "source": [ "## Trajectories\n", "\n", "A trajectory is a `LineString` coupled with a time sample for each point in the `LineString`. \n", "Use `cuspatial.trajectory.derive_trajectories` to group trajectory datasets and sort by time.\n", "\n", "\n", "\n", "### [cuspatial.derive_trajectories](https://docs.rapids.ai/api/cuspatial/stable/api_docs/trajectory.html#cuspatial.derive_trajectories)" ] }, { "cell_type": "code", "execution_count": 6, "id": "cb5acdad-53aa-418f-9948-8445515bd2b2", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " object_id x y timestamp\n", "0 1 0.680146 0.874341 1970-01-01 00:00:00.125\n", "1 1 0.843522 0.044402 1970-01-01 00:00:00.834\n", "2 1 0.837039 0.351025 1970-01-01 00:00:01.335\n", "3 1 0.946184 0.479038 1970-01-01 00:00:01.791\n", "4 1 0.117322 0.182117 1970-01-01 00:00:02.474\n", "0 0\n", "1 2455\n", "2 4899\n", "3 7422\n", "4 9924\n", " ... \n", "394 987408\n", "395 989891\n", "396 992428\n", "397 994975\n", "398 997448\n", "Length: 399, dtype: int32\n" ] } ], "source": [ "# 1m random trajectory samples\n", "ids = cupy.random.randint(1, 400, 1000000)\n", "timestamps = cupy.random.random(1000000)*1000000\n", "xy= cupy.random.random(2000000)\n", "trajs = cuspatial.GeoSeries.from_points_xy(xy)\n", "sorted_trajectories, trajectory_offsets = \\\n", " cuspatial.core.trajectory.derive_trajectories(ids, trajs, timestamps)\n", "# sorted_trajectories is a DataFrame containing all trajectory samples\n", "# sorted first by `object_id` and then by `timestamp`.\n", "print(sorted_trajectories.head())\n", "# trajectory_offsets is a Series containing the start position of each\n", "# trajectory in sorted_trajectories.\n", "print(trajectory_offsets)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "3c4a90f9-8661-4fda-9026-473e6ce87bd2", "metadata": {}, "source": [ "`derive_trajectories` sorts the trajectories by `object_id`, then `timestamp`, and returns a \n", "tuple containing the sorted trajectory data frame in the first index position and the offsets \n", "buffer defining the start and stop of each trajectory in the second index position. \n", "\n", "### [cuspatial.trajectory_distances_and_speeds](https://docs.rapids.ai/api/cuspatial/stable/api_docs/trajectory.html#cuspatial.trajectory_distances_and_speeds)\n", "\n", "Use `trajectory_distance_and_speed` to calculate the overall distance travelled in meters and \n", "the speed of a set of trajectories with the same format as the result returned by `derive_trajectories`." ] }, { "cell_type": "code", "execution_count": 7, "id": "03b75847-090d-40f8-8147-cb10b900d6ec", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " distance speed\n", "trajectory_id \n", "0 1.278996e+06 1280.320089\n", "1 1.267179e+06 1268.370390\n", "2 1.294437e+06 1295.905261\n", "3 1.323413e+06 1323.956714\n", "4 1.309590e+06 1311.561012\n" ] } ], "source": [ "trajs = cuspatial.GeoSeries.from_points_xy(\n", " sorted_trajectories[[\"x\", \"y\"]].interleave_columns()\n", ")\n", "d_and_s = cuspatial.core.trajectory.trajectory_distances_and_speeds(\n", " len(cudf.Series(ids).unique()),\n", " sorted_trajectories['object_id'],\n", " trajs,\n", " sorted_trajectories['timestamp']\n", ")\n", "print(d_and_s.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "eeb0da7c-0bdf-49ec-8244-4165acc96074", "metadata": {}, "source": [ "Finally, compute the bounding boxes of trajectories that follow the format of the above two \n", "examples:" ] }, { "attachments": {}, "cell_type": "markdown", "id": "12a3ce7e-82b7-4b51-8227-5e157a48701c", "metadata": { "tags": [] }, "source": [ "## Bounding\n", "\n", "Compute the bounding boxes of `n` polygons or linestrings:\n", "\n", "\n", "\n", "### [cuspatial.trajectory_bounding_boxes](https://docs.rapids.ai/api/cuspatial/stable/api_docs/trajectory.html#cuspatial.trajectory_bounding_boxes)\n", "\n", "`trajectory_bounding_boxes` works out of the box with the values returned by `derive_trajectories`. \n", "Its arguments are the number of incoming objects, the offsets of those objects, and x and y point buffers." ] }, { "cell_type": "code", "execution_count": 7, "id": "452f60cb-28cc-4ad8-8aa2-9d73e3d56ec6", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " x_min y_min x_max y_max\n", "0 0.000361 0.000170 0.999582 0.999485\n", "1 0.000184 0.000647 0.999939 0.999884\n", "2 0.000461 0.001395 0.999938 0.999297\n", "3 0.000093 0.000073 0.999819 0.999544\n", "4 0.000105 0.000112 0.999952 0.999013\n" ] } ], "source": [ "bounding_boxes = cuspatial.core.trajectory.trajectory_bounding_boxes(\n", " len(cudf.Series(ids, dtype=\"int32\").unique()),\n", " sorted_trajectories['object_id'],\n", " trajs\n", ")\n", "print(bounding_boxes.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "a56dfe17-1739-4b20-85c9-fcb5902c1585", "metadata": {}, "source": [ "### [cuspatial.polygon_bounding_boxes](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.polygon_bounding_boxes)\n", "\n", "`polygon_bounding_boxes` supports more complex geometry objects such as `Polygon`s with multiple \n", "rings. The combination of `part_offset` and `ring_offset` allows the function to use only the \n", "exterior ring for computing the bounding box." ] }, { "cell_type": "code", "execution_count": 8, "id": "9266aac4-f925-4fb7-b287-5f0b795d5756", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " minx miny maxx maxy\n", "0 29.339998 -11.720938 40.316590 -0.950000\n", "1 -17.063423 20.999752 -8.665124 27.656426\n", "2 46.466446 40.662325 87.359970 55.385250\n", "3 55.928917 37.144994 73.055417 45.586804\n", "4 12.182337 -13.257227 31.174149 5.256088\n" ] } ], "source": [ "single_polygons = cuspatial.from_geopandas(\n", " host_dataframe['geometry'][host_dataframe['geometry'].type == \"Polygon\"]\n", ")\n", "bounding_box_polygons = cuspatial.core.spatial.bounding.polygon_bounding_boxes(\n", " single_polygons\n", ")\n", "print(bounding_box_polygons.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "85197478-801c-4d2d-8b10-c1136d7bb15c", "metadata": {}, "source": [ "### [cuspatial.linestring_bounding_boxes](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.linestring_bounding_boxes)\n", "\n", "Equivalently, we can treat trajectories as Linestrings and compute the same bounding boxes from \n", "the above trajectory calculation more generally:" ] }, { "cell_type": "code", "execution_count": 9, "id": "15b5bb38-702f-4360-b48c-2e49ffd650d7", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " minx miny maxx maxy\n", "0 0.000261 0.000070 0.999682 0.999585\n", "1 0.000084 0.000547 1.000039 0.999984\n", "2 0.000361 0.001295 1.000038 0.999397\n", "3 -0.000007 -0.000027 0.999919 0.999644\n", "4 0.000005 0.000012 1.000052 0.999113\n" ] } ], "source": [ "lines = cuspatial.GeoSeries.from_linestrings_xy(\n", " trajs.points.xy, trajectory_offsets, cupy.arange(len(trajectory_offsets))\n", ")\n", "trajectory_bounding_boxes = cuspatial.core.spatial.bounding.linestring_bounding_boxes(\n", " lines, 0.0001\n", ")\n", "print(trajectory_bounding_boxes.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "81c4d3ca-5d3f-4ae1-ae8e-ac1e252f3e17", "metadata": {}, "source": [ "## Projection\n", "\n", "cuSpatial provides a simple sinusoidal longitude / latitude to Cartesian coordinate transform. \n", "This function requires an origin point to determine the scaling parameters for the lonlat inputs. \n", "\n", "### [cuspatial.sinusoidal_projection](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.sinusoidal_projection)\n", "\n", "The following cell converts the lonlat coordinates of the country of Afghanistan to Cartesian \n", "coordinates in km, centered around the center of the country, suitable for graphing and display." ] }, { "cell_type": "code", "execution_count": 10, "id": "a7a870dd-c0ae-41c1-a66c-cff4bd2db0ec", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 POINT (112.174 -281.59)\n", "1 POINT (62.152 -280.852)\n", "2 POINT (-5.573 -257.391)\n", "3 POINT (-33.071 -243.849)\n", "4 POINT (-98.002 -279.54)\n", "dtype: geometry\n" ] } ], "source": [ "gpu_dataframe = cuspatial.from_geopandas(host_dataframe)\n", "afghanistan = gpu_dataframe['geometry'][gpu_dataframe['NAME'] == 'Afghanistan']\n", "points = cuspatial.GeoSeries.from_points_xy(afghanistan.polygons.xy)\n", "projected = cuspatial.sinusoidal_projection(\n", " afghanistan.polygons.x.mean(),\n", " afghanistan.polygons.y.mean(),\n", " points\n", ")\n", "print(projected.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c9085e80-e7ba-48e9-8c50-12e544e3af46", "metadata": {}, "source": [ "## Distance\n", "cuSpatial provides a growing suite of distance computation functions. Parallel distance functions \n", "come in two main forms: pairwise, which computes a distance for each corresponding pair of input \n", "geometries; and all-pairs, which computes a distance for the each element of the Cartesian product \n", "of input geometries (for each input geometry in A, compute the distance from A to each input\n", "geometry in B).\"\n", " \n", "Two pairwise distance functions are included in cuSpatial: `haversine` and `pairwise_linestring`. \n", "The `hausdorff` clustering distances algorithm is also available, computing the hausdorff \n", "distance across the cartesian product of its single input.\n", "\n", "### [cuspatial.directed_hausdorff_distance](https://docs.rapids.ai/api/cuspatial/stable/api_docs/trajectory.html#cuspatial.directed_hausdorff_distance)\n", "\n", "The directed Hausdorff distance from one space to another is the greatest of all the distances \n", "between any point in the first space to the closet point in the second. This is especially useful \n", "as a similarity metric between trajectories.\n", "\n", "\n", "\n", "[Hausdorff distance](https://en.wikipedia.org/wiki/Hausdorff_distance)" ] }, { "cell_type": "code", "execution_count": 11, "id": "e75b0352-0f80-404d-a113-f301601cd5a3", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0 1 2 3 4 5 6 \\\n", "0 0.000000 0.034755 0.031989 0.031959 0.031873 0.038674 0.029961 \n", "1 0.030328 0.000000 0.038672 0.032086 0.031049 0.032170 0.032275 \n", "2 0.027640 0.030539 0.000000 0.036737 0.033055 0.043447 0.028812 \n", "3 0.031497 0.033380 0.035224 0.000000 0.032581 0.035484 0.030339 \n", "4 0.031079 0.032256 0.035731 0.039084 0.000000 0.036416 0.031369 \n", "\n", " 7 8 9 ... 388 389 390 391 \\\n", "0 0.029117 0.040962 0.033259 ... 0.031614 0.036447 0.035548 0.028233 \n", "1 0.030215 0.034443 0.032998 ... 0.030594 0.035665 0.031473 0.031916 \n", "2 0.031807 0.039269 0.033250 ... 0.031998 0.033636 0.034646 0.032615 \n", "3 0.034792 0.045755 0.031810 ... 0.033623 0.031359 0.034923 0.032287 \n", "4 0.030388 0.033751 0.034029 ... 0.030705 0.040339 0.034328 0.029027 \n", "\n", " 392 393 394 395 396 397 \n", "0 0.034176 0.030057 0.033863 0.031111 0.034590 0.033850 \n", "1 0.037483 0.033489 0.041403 0.029784 0.035374 0.038179 \n", "2 0.036681 0.030642 0.038432 0.032481 0.034810 0.036695 \n", "3 0.032808 0.029771 0.040891 0.030802 0.032279 0.038443 \n", "4 0.035645 0.027703 0.037529 0.029356 0.031260 0.035501 \n", "\n", "[5 rows x 398 columns]\n" ] } ], "source": [ "coordinates = sorted_trajectories[['x', 'y']].interleave_columns()\n", "spaces = cuspatial.GeoSeries.from_multipoints_xy(\n", " coordinates, trajectory_offsets\n", ")\n", "hausdorff_distances = cuspatial.core.spatial.distance.directed_hausdorff_distance(\n", " spaces\n", ")\n", "print(hausdorff_distances.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "993d7566-203e-4b87-adad-088e2fd92eed", "metadata": {}, "source": [ "### [cuspatial.haversine_distance](https://docs.rapids.ai/api/cuspatial/stable/api_docs/gis.html#cuspatial.haversine_distance)\n", "\n", "Haversine distance is the great circle distance between longitude and latitude pairs. cuSpatial \n", "uses the `lon/lat` ordering to better reflect the cartesian coordinates of great circle \n", "coordinates: `x/y`.\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 12, "id": "70f66319-c4d2-4a93-ab98-0debcce4a719", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "0 9959.695143\n", "1 9803.166859\n", "2 9876.857085\n", "3 9925.097106\n", "4 9927.268486\n", "Name: None, dtype: float64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gpu_dataframe = cuspatial.from_geopandas(host_dataframe)\n", "polygons_first = gpu_dataframe['geometry'][0:10]\n", "polygons_second = gpu_dataframe['geometry'][10:30]\n", "\n", "points_first = polygons_first.polygons.xy[0:1000]\n", "points_second = polygons_second.polygons.xy[0:1000]\n", "\n", "first = cuspatial.GeoSeries.from_points_xy(points_first)\n", "second = cuspatial.GeoSeries.from_points_xy(points_second)\n", "\n", "# The number of coordinates in two sets of polygons vary, so\n", "# we'll just compare the first set of 1000 values here.\n", "distances_in_meters = cuspatial.haversine_distance(\n", " first, second\n", ")\n", "cudf.Series(distances_in_meters).head()" ] }, { "cell_type": "markdown", "id": "f21c1709-dfc1-411b-b9d5-ec5dc345367f", "metadata": {}, "source": [ "This above method reads the GeoPandas data from CPU memory into GPU memory and then cuSpatial processes it. If the data is already in a cuDF GPU dataframe, you can quickly calculate Haversine distances using the method below. This maximizes speed by keeping all the processing on the GPU and is very useful when working on large datasets." ] }, { "cell_type": "code", "execution_count": 13, "id": "c32741b0-c550-4d0f-b494-45191f5b60fd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " latitude longitude atlanta_lat atlanta_lng atlanta_dist\n", "0 17.1167 -61.7833 33.749 -84.388 11961.556540\n", "1 17.1333 -61.7833 33.749 -84.388 11963.392729\n", "2 25.3330 55.5170 33.749 -84.388 12243.126130\n", "3 25.2550 55.3640 33.749 -84.388 12233.867463\n", "4 24.4330 54.6510 33.749 -84.388 12139.822218\n" ] } ], "source": [ "# Generate data to be used to create a cuDF dataframe. \n", "# The data to be processed by Haversine MUST be a Float.\n", "a = {\"latitude\":[17.1167, 17.1333, 25.333, 25.255, 24.433, 24.262, 35.317, 34.21, 34.566, 31.5, 36.7167, 30.5667, 28.05, 22.8, 35.7297, 36.97, 36.78, 36.8, 36.8, 36.72],\n", " \"longitude\": [-61.7833, -61.7833, 55.517, 55.364, 54.651, 55.609, 69.017, 62.228, 69.212, 65.85, 3.25, 2.8667, 9.6331, 5.4331, 0.65, 7.79, 3.07, 3.03, 3.04, 4.05]}\n", "df = cudf.DataFrame(data=a)\n", "\n", "# Create cuSpatial GeoSeries from cuDF Dataframe\n", "cuGeoSeries = cuspatial.GeoSeries.from_points_xy(df[['longitude', 'latitude']].interleave_columns())\n", "\n", "# Create Comparator cuSpatial GeoSeries from a comparator point\n", "df['atlanta_lat'] = 33.7490\n", "df['atlanta_lng'] = -84.3880\n", "atlGeoSeries = cuspatial.GeoSeries.from_points_xy(df[['atlanta_lat', 'atlanta_lng']].interleave_columns())\n", "\n", "# Calculate Haversine Distance of cuDF dataframe to comparator point\n", "df['atlanta_dist'] = cuspatial.haversine_distance(cuGeoSeries, atlGeoSeries)\n", "print(df.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7f2239c5-58d0-4912-9bd7-246cc6741c0a", "metadata": {}, "source": [ "### Pairwise distance\n", "\n", "`pairwise_linestring_distance` computes the distance between a `GeoSeries` of Linestrings of \n", "length `n` and a corresponding `GeoSeries` of Linestrings of `n` length. It returns the \n", "minimum distance from any point in the first linestring of the pair to the nearest segment \n", "or point within the second Linestring of the pair.\n", "\n", "The input accepts a pair of geoseries as input sequences of linestring arrays.\n", "\n", "The below example uses the polygons from `naturalearth_lowres` and treats them as linestrings. \n", "The first example computes the distances between all polygons and themselves, while the second \n", "example computes the distance between the first 50 polygons and the second 50 polygons.\n", "\n", "### [cuspatial.pairwise_linestring_distance](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.pairwise_linestring_distance)" ] }, { "cell_type": "code", "execution_count": 14, "id": "35dfb7c9-1914-488a-b22e-8d0067ea7a8b", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 0.0\n", "1 0.0\n", "2 0.0\n", "3 0.0\n", "4 0.0\n", "dtype: float64\n", "0 152.200610\n", "1 44.076445\n", "2 2.417269\n", "3 44.197151\n", "4 75.821029\n", "dtype: float64\n" ] } ], "source": [ "gpu_boundaries = cuspatial.from_geopandas(host_dataframe.geometry.boundary)\n", "zeros = cuspatial.pairwise_linestring_distance(\n", " gpu_boundaries[0:50],\n", " gpu_boundaries[0:50]\n", ")\n", "print(zeros.head())\n", "lines1 = gpu_boundaries[0:50]\n", "lines2 = gpu_boundaries[50:100]\n", "distances = cuspatial.pairwise_linestring_distance(\n", " lines1, lines2\n", ")\n", "print(distances.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "de6b73ac-1b48-422c-8463-37367ad73507", "metadata": {}, "source": [ "`pairwise_point_linestring_distance` computes the distance between pairs of points and \n", "linestrings. It can be used with polygons treated as linestrings as well. In the following \n", "example the minimum distance from a country's center to it's border is computed.\n", "\n", "### [cuspatial.pairwise_point_linestring_distance](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.pairwise_point_linestring_distance)\n", "\n", "Using WGS 84 Pseudo-Mercator, distances are in meters." ] }, { "cell_type": "code", "execution_count": 15, "id": "40e9a41e-21af-47cc-a142-b19a67941f7f", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " featurecla scalerank LABELRANK SOVEREIGNT \\\n", "1 Admin-0 country 1 3 United Republic of Tanzania \n", "2 Admin-0 country 1 7 Western Sahara \n", "5 Admin-0 country 1 3 Kazakhstan \n", "6 Admin-0 country 1 3 Uzbekistan \n", "11 Admin-0 country 1 2 Democratic Republic of the Congo \n", "\n", " SOV_A3 ADM0_DIF LEVEL TYPE TLC \\\n", "1 TZA 0 2 Sovereign country 1 \n", "2 SAH 0 2 Indeterminate 1 \n", "5 KA1 1 1 Sovereignty 1 \n", "6 UZB 0 2 Sovereign country 1 \n", "11 COD 0 2 Sovereign country 1 \n", "\n", " ADMIN ... FCLASS_ID FCLASS_PL \\\n", "1 United Republic of Tanzania ... None None \n", "2 Western Sahara ... Unrecognized Unrecognized \n", "5 Kazakhstan ... None None \n", "6 Uzbekistan ... None None \n", "11 Democratic Republic of the Congo ... None None \n", "\n", " FCLASS_GR FCLASS_IT FCLASS_NL FCLASS_SE FCLASS_BD FCLASS_UA \\\n", "1 None None None None None None \n", "2 None None Unrecognized None None None \n", "5 None None None None None None \n", "6 None None None None None None \n", "11 None None None None None None \n", "\n", " geometry border_distance \n", "1 POLYGON ((3774143.866 -105758.362, 3792946.708... 8047.288391 \n", "2 POLYGON ((-964649.018 3205725.605, -964597.245... 593137.492497 \n", "5 POLYGON ((9724867.413 6311418.173, 9640131.701... 37091.213890 \n", "6 POLYGON ((6230350.563 5057973.384, 6225978.591... 278633.467299 \n", "11 POLYGON ((3266113.592 -501451.658, 3286149.877... 35812.988244 \n", "\n", "[5 rows x 170 columns]\n", "(GPU)\n", "\n" ] } ], "source": [ "# Convert input dataframe to Pseudo-Mercator projection.\n", "host_dataframe3857 = host_dataframe.to_crs(3857)\n", "polygons = host_dataframe3857[host_dataframe3857['geometry'].type == \"Polygon\"]\n", "gpu_polygons = cuspatial.from_geopandas(polygons)\n", "# Extract mean_x and mean_y from each country\n", "mean_x = [gpu_polygons['geometry'].iloc[[ix]].polygons.x.mean() for ix in range(len(gpu_polygons))]\n", "mean_y = [gpu_polygons['geometry'].iloc[[ix]].polygons.y.mean() for ix in range(len(gpu_polygons))]\n", "# Convert mean_x/mean_y values into Points for use in API.\n", "points = cuspatial.GeoSeries([Point(point) for point in zip(mean_x, mean_y)])\n", "# Convert Polygons into Linestrings for use in API.\n", "linestring_df = cuspatial.from_geopandas(geopandas.geoseries.GeoSeries(\n", " [MultiLineString(mapping(polygons['geometry'].iloc[ix])[\"coordinates\"]) for ix in range(len(polygons))]\n", "))\n", "gpu_polygons['border_distance'] = cuspatial.pairwise_point_linestring_distance(\n", " points, linestring_df\n", ")\n", "print(gpu_polygons.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "008d320d-ca47-459f-9fff-8769494c8a61", "metadata": {}, "source": [ "### cuspatial.pairwise_point_polygon_distance\n", "\n", "Using WGS 84 Pseudo-Mercator, distances are in meters." ] }, { "cell_type": "code", "execution_count": 16, "id": "258c9a8c-7fe3-4047-80b7-00878d9fb2f1", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " featurecla scalerank LABELRANK SOVEREIGNT SOV_A3 \\\n", "0 Admin-0 country 1 6 Fiji FJI \n", "1 Admin-0 country 1 3 United Republic of Tanzania TZA \n", "2 Admin-0 country 1 7 Western Sahara SAH \n", "3 Admin-0 country 1 2 Canada CAN \n", "4 Admin-0 country 1 2 United States of America US1 \n", "\n", " ADM0_DIF LEVEL TYPE TLC ADMIN ... \\\n", "0 0 2 Sovereign country 1 Fiji ... \n", "1 0 2 Sovereign country 1 United Republic of Tanzania ... \n", "2 0 2 Indeterminate 1 Western Sahara ... \n", "3 0 2 Sovereign country 1 Canada ... \n", "4 1 2 Country 1 United States of America ... \n", "\n", " FCLASS_PL FCLASS_GR FCLASS_IT FCLASS_NL FCLASS_SE FCLASS_BD \\\n", "0 None None None None None None \n", "1 None None None None None None \n", "2 Unrecognized None None Unrecognized None None \n", "3 None None None None None None \n", "4 None None None None None None \n", "\n", " FCLASS_UA geometry distance_from \\\n", "0 None MULTIPOLYGON (((180 -16.06713, 180 -16.55522, ... Vatican City \n", "1 None POLYGON ((33.90371 -0.95, 34.07262 -1.05982, 3... San Marino \n", "2 None POLYGON ((-8.66559 27.65643, -8.66512 27.58948... Vaduz \n", "3 None MULTIPOLYGON (((-122.84 49, -122.97421 49.0025... Lobamba \n", "4 None MULTIPOLYGON (((-122.84 49, -120 49, -117.0312... Luxembourg \n", "\n", " distance \n", "0 5.329915e+06 \n", "1 5.628613e+06 \n", "2 6.057264e+06 \n", "3 4.626961e+06 \n", "4 6.415631e+06 \n", "\n", "[5 rows x 171 columns]\n", "(GPU)\n", "\n" ] } ], "source": [ "countries = host_dataframe\n", "\n", "cities = geopandas.read_file('https://naturalearth.s3.amazonaws.com/110m_cultural/ne_110m_populated_places_simple.zip')\n", "cities = cities.to_crs(3857)\n", "\n", "gpu_cities = cuspatial.from_geopandas(cities)\n", "gpu_countries = cuspatial.from_geopandas(countries)\n", "dist = cuspatial.pairwise_point_polygon_distance(\n", " gpu_cities.geometry[:len(gpu_countries)], gpu_countries.geometry\n", ")\n", "\n", "gpu_countries[\"distance_from\"] = cities.name\n", "gpu_countries[\"distance\"] = dist\n", "\n", "print(gpu_countries.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e0b4d618", "metadata": {}, "source": [ "### cuspatial.pairwise_linestring_polygon_distance\n", "\n", "Using WGS 84 Pseudo-Mercator, distances are in meters." ] }, { "cell_type": "code", "execution_count": 17, "id": "5863871e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namegeometry
index
0Columbus AvenueLINESTRING (-8234860.077 4980333.535, -8234863...
1West 80th StreetLINESTRING (-8235173.854 4980508.442, -8235160...
2Amsterdam AvenueLINESTRING (-8235173.854 4980508.442, -8235168...
3West 80th StreetLINESTRING (-8235369.475 4980617.398, -8235347...
4BroadwayLINESTRING (-8235369.475 4980617.398, -8235373...
\n", "
" ], "text/plain": [ " name geometry\n", "index \n", "0 Columbus Avenue LINESTRING (-8234860.077 4980333.535, -8234863...\n", "1 West 80th Street LINESTRING (-8235173.854 4980508.442, -8235160...\n", "2 Amsterdam Avenue LINESTRING (-8235173.854 4980508.442, -8235168...\n", "3 West 80th Street LINESTRING (-8235369.475 4980617.398, -8235347...\n", "4 Broadway LINESTRING (-8235369.475 4980617.398, -8235373..." ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# all driveways within 2km range of central park, nyc\n", "\n", "if not os.path.exists(\"./streets_3857.csv\"):\n", " import osmnx as ox\n", " graph = ox.graph_from_point((40.769361, -73.977655), dist=2000, network_type=\"drive\")\n", " nodes, streets = ox.graph_to_gdfs(graph)\n", " streets = streets.to_crs(3857)\n", " streets = streets.reset_index(drop=True)\n", " streets.index.name = \"index\"\n", " streets[[\"name\", \"geometry\"]].to_csv(\"streets_3857.csv\")\n", "\n", "# The data is under notebooks/streets_3857.csv\n", "streets = pd.read_csv(\"./streets_3857.csv\", index_col=\"index\")\n", "streets.geometry = streets.geometry.apply(wkt.loads)\n", "streets = geopandas.GeoDataFrame(streets)\n", "streets.head()" ] }, { "cell_type": "code", "execution_count": 19, "id": "f4c67464", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/raid/jlamb/miniforge/envs/cuspatial-dev/lib/python3.11/site-packages/osmnx/features.py:294: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.\n", " polygon = gdf_place[\"geometry\"].unary_union\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
geometry
index
0POLYGON ((-8236139.639 4975314.625, -8235990.3...
0POLYGON ((-8236139.639 4975314.625, -8235990.3...
0POLYGON ((-8236139.639 4975314.625, -8235990.3...
0POLYGON ((-8236139.639 4975314.625, -8235990.3...
0POLYGON ((-8236139.639 4975314.625, -8235990.3...
\n", "
" ], "text/plain": [ " geometry\n", "index \n", "0 POLYGON ((-8236139.639 4975314.625, -8235990.3...\n", "0 POLYGON ((-8236139.639 4975314.625, -8235990.3...\n", "0 POLYGON ((-8236139.639 4975314.625, -8235990.3...\n", "0 POLYGON ((-8236139.639 4975314.625, -8235990.3...\n", "0 POLYGON ((-8236139.639 4975314.625, -8235990.3..." ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The polygon of the Empire State Building\n", "\n", "if not os.path.exists(\"./esb_3857.csv\"):\n", " import osmnx as ox\n", " esb = ox.features.features_from_place('Empire State Building, New York', tags={\"building\": True})\n", " esb = esb.to_crs(3857)\n", " esb = esb.geometry.reset_index(drop=True)\n", " esb.index.name = \"index\"\n", " esb.to_csv(\"esb_3857.csv\")\n", "\n", "# The data is under notebooks/esb_3857.csv\n", "esb = pd.read_csv(\"./esb_3857.csv\", index_col=\"index\")\n", "esb.geometry = esb.geometry.apply(wkt.loads)\n", "esb = geopandas.GeoDataFrame(esb)\n", "esb = pd.concat([esb.iloc[0:1]] * len(streets))\n", "esb.head()" ] }, { "cell_type": "code", "execution_count": 20, "id": "d4e68e87", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namedist
0Columbus Avenue4993.583717
1West 80th Street5103.472213
2Amsterdam Avenue5208.373183
3West 80th Street5275.851781
4Broadway5178.999774
.........
1862West 82nd Street5411.762092
1863Broadway5476.345847
1864West 84th Street5613.403002
1865West 75th Street4750.092380
1866Broadway4638.894939
\n", "

1867 rows × 2 columns

\n", "
" ], "text/plain": [ " name dist\n", "0 Columbus Avenue 4993.583717\n", "1 West 80th Street 5103.472213\n", "2 Amsterdam Avenue 5208.373183\n", "3 West 80th Street 5275.851781\n", "4 Broadway 5178.999774\n", "... ... ...\n", "1862 West 82nd Street 5411.762092\n", "1863 Broadway 5476.345847\n", "1864 West 84th Street 5613.403002\n", "1865 West 75th Street 4750.092380\n", "1866 Broadway 4638.894939\n", "\n", "[1867 rows x 2 columns]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Straight line distance between the driveways to the Empire State Building\n", "gpu_streets = cuspatial.from_geopandas(streets.geometry)\n", "gpu_esb = cuspatial.from_geopandas(esb.geometry)\n", "\n", "dist = cuspatial.pairwise_linestring_polygon_distance(gpu_streets, gpu_esb).rename(\"dist\")\n", "pd.concat([streets[\"name\"].reset_index(drop=True), dist.to_pandas()], axis=1)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "3e1bb692", "metadata": {}, "source": [ "### cuspatial.pairwise_polygon_distance\n", "\n", "Using WGS 84 Pseudo-Mercator, distances are in meters." ] }, { "cell_type": "code", "execution_count": 21, "id": "951625da", "metadata": {}, "outputs": [], "source": [ "african_countries = gpu_countries[gpu_countries.CONTINENT == \"Africa\"].sort_values(\"POP_EST\", ascending=False)\n", "asian_countries = gpu_countries[gpu_countries.CONTINENT == \"Asia\"].sort_values(\"POP_EST\", ascending=False)" ] }, { "cell_type": "code", "execution_count": 22, "id": "2f0e1118", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AfricaAsiadist
0NigeriaChina64.966620
1EthiopiaIndia25.598868
2EgyptIndonesia60.717434
3Dem. Rep. CongoPakistan37.489668
4South AfricaBangladesh72.860545
5TanzaniaJapan97.872886
6KenyaPhilippines75.450451
7UgandaVietnam69.827567
8AlgeriaTurkey17.927419
9SudanIran13.990335
\n", "
" ], "text/plain": [ " Africa Asia dist\n", "0 Nigeria China 64.966620\n", "1 Ethiopia India 25.598868\n", "2 Egypt Indonesia 60.717434\n", "3 Dem. Rep. Congo Pakistan 37.489668\n", "4 South Africa Bangladesh 72.860545\n", "5 Tanzania Japan 97.872886\n", "6 Kenya Philippines 75.450451\n", "7 Uganda Vietnam 69.827567\n", "8 Algeria Turkey 17.927419\n", "9 Sudan Iran 13.990335" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Straight line distance between the top 10 most populated countries in Asia and Africa\n", "population_top10_africa = african_countries[:10].reset_index(drop=True)\n", "population_top10_asia = asian_countries[:10].reset_index(drop=True)\n", "dist = cuspatial.pairwise_polygon_distance(\n", " population_top10_africa.geometry, population_top10_asia.geometry)\n", "\n", "cudf.concat([\n", " population_top10_africa[\"NAME\"].rename(\"Africa\"),\n", " population_top10_asia[\"NAME\"].rename(\"Asia\"), \n", " dist.rename(\"dist\")], axis=1\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f5f27dc3-46ee-4a62-82de-20f76744382f", "metadata": {}, "source": [ "## Filtering\n", "\n", "The filtering module contains `points_in_spatial_window`, which returns from a set of points only those points that fall within a spatial window defined by four bounding coordinates: `min_x`, `max_x`, `min_y`, and `max_y`. The following example finds only the points of polygons that fall within 1 standard deviation of the mean of all of the polygons.\n", "\n", "\n", "\n", "### [cuspatial.points_in_spatial_window](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.points_in_spatial_window)" ] }, { "cell_type": "code", "execution_count": 23, "id": "d1ade9da-c9e2-45c4-9685-dffeda3fd358", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 POINT (33.90371 -0.95)\n", "1 POINT (34.07262 -1.05982)\n", "2 POINT (37.69869 -3.09699)\n", "3 POINT (37.7669 -3.67712)\n", "4 POINT (39.20222 -4.67677)\n", "dtype: geometry\n" ] } ], "source": [ "gpu_dataframe = cuspatial.from_geopandas(host_dataframe)\n", "geometry = gpu_dataframe['geometry']\n", "points = cuspatial.GeoSeries.from_points_xy(geometry.polygons.xy)\n", "mean_x, std_x = (geometry.polygons.x.mean(), geometry.polygons.x.std())\n", "mean_y, std_y = (geometry.polygons.y.mean(), geometry.polygons.y.std())\n", "avg_points = cuspatial.points_in_spatial_window(\n", " points,\n", " mean_x - std_x,\n", " mean_x + std_x,\n", " mean_y - std_y,\n", " mean_y + std_y\n", ")\n", "print(avg_points.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "5027a3dd-78bb-4d17-af94-506d0ed689c8", "metadata": {}, "source": [ "With some careful grouping, one can reconstruct the original complete polygons that fall within the range." ] }, { "attachments": {}, "cell_type": "markdown", "id": "3b33ce2b-965f-42a1-a89e-66d7ca80d907", "metadata": {}, "source": [ "## Set Operations" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d73548f3-c9bb-43ff-9788-858f3b7d08e4", "metadata": {}, "source": [ "### Linestring Intersections\n", "\n", "cuSpatial provides a linestring-linestring intersection algorithm to compute the overlapping geometries between two linestrings.\n", "The API also returns the ids for each returned geometry to help user to trace back the source geometry." ] }, { "cell_type": "code", "execution_count": 24, "id": "cc72a44d-a9bf-4432-9898-de899ac45869", "metadata": { "tags": [] }, "outputs": [], "source": [ "from cuspatial.core.binops.intersection import pairwise_linestring_intersection\n", "\n", "usa_boundary = cuspatial.from_geopandas(host_dataframe[host_dataframe.NAME == \"United States of America\"].geometry.boundary)\n", "canada_boundary = cuspatial.from_geopandas(host_dataframe[host_dataframe.NAME == \"Canada\"].geometry.boundary)\n", "\n", "list_offsets, geometries, look_back_ids = pairwise_linestring_intersection(usa_boundary, canada_boundary)" ] }, { "cell_type": "code", "execution_count": 25, "id": "1125fd17-afe1-4b9c-b48d-8842dd3700b3", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "\n", "[\n", " 0,\n", " 142\n", "]\n", "dtype: int32" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The first integer series shows that the result contains 1 row (since we only have 1 pair of linestrings as input).\n", "# This row contains 144 geometires.\n", "list_offsets" ] }, { "cell_type": "code", "execution_count": 31, "id": "b281e3bb-42d4-4d60-9cb2-b7dcc20b4776", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "0 POINT (-130.53611 54.80275)\n", "1 POINT (-130.53611 54.80278)\n", "2 POINT (-130.53611 54.80275)\n", "3 POINT (-129.98 55.285)\n", "4 POINT (-130.53611 54.80278)\n", " ... \n", "137 LINESTRING (-120 49, -117.03121 49)\n", "138 LINESTRING (-122.84 49, -120 49)\n", "139 LINESTRING (-117.03121 49, -107.05 49)\n", "140 LINESTRING (-83.89077 46.11693, -83.61613 46.1...\n", "141 LINESTRING (-82.69009 41.67511, -82.43928 41.6...\n", "Length: 142, dtype: geometry" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The second element is a geoseries that contains the intersecting geometries, with 144 rows, including points and linestrings.\n", "geometries" ] }, { "cell_type": "code", "execution_count": 26, "id": "e19873b9-2614-4242-ad67-caa47f807d04", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lhs_linestring_idlhs_segment_idrhs_linestring_idrhs_segment_id
0[8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, ...[18, 16, 18, 15, 17, 137, 14, 16, 13, 15, 14, ...[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...[9, 10, 10, 11, 11, 28, 12, 12, 13, 13, 14, 15...
\n", "
" ], "text/plain": [ " lhs_linestring_id \\\n", "0 [8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, ... \n", "\n", " lhs_segment_id \\\n", "0 [18, 16, 18, 15, 17, 137, 14, 16, 13, 15, 14, ... \n", "\n", " rhs_linestring_id \\\n", "0 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... \n", "\n", " rhs_segment_id \n", "0 [9, 10, 10, 11, 11, 28, 12, 12, 13, 13, 14, 15... " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The third element is a dataframe that contains IDs to the input segments and linestrings, 4 for each result row.\n", "# Each represents ids to lhs, rhs linestring and segment ids.\n", "look_back_ids" ] }, { "attachments": {}, "cell_type": "markdown", "id": "a17bd64a", "metadata": {}, "source": [ "## Spatial Joins\n", "\n", "cuSpatial provides a number of functions to facilitate high-performance spatial joins, \n", "including unindexed and quadtree-indexed point-in-polygon and quadtree-indexed point to nearest\n", "linestring.\n", "\n", "The API for spatial joins does not yet match GeoPandas, but with knowledge of cuSpatial data formats\n", "you can call `cuspatial.point_in_polygon` for large numbers of points on 32 polygons or less, or\n", "call `cuspatial.quadtree_point_in_polygon` for large numbers of points and polygons. \n", "\n", "### Unindexed Point-in-polygon Join\n", "\n", "### [cuspatial.point_in_polygon](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.point_in_polygon)" ] }, { "cell_type": "code", "execution_count": 27, "id": "bf7b2256", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "1 11896\n", "2 1268\n", "5 50835\n", "6 7792\n", "11 29318\n", "dtype: int64" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "single_polygons = host_dataframe[host_dataframe['geometry'].type == \"Polygon\"]\n", "gpu_dataframe = cuspatial.from_geopandas(single_polygons)\n", "x_points = (cupy.random.random(10000000) - 0.5) * 360\n", "y_points = (cupy.random.random(10000000) - 0.5) * 180\n", "xy = cudf.DataFrame({\"x\": x_points, \"y\": y_points}).interleave_columns()\n", "points = cuspatial.GeoSeries.from_points_xy(xy)\n", "\n", "short_dataframe = gpu_dataframe.iloc[0:31]\n", "geometry = short_dataframe['geometry']\n", "\n", "points_in_polygon = cuspatial.point_in_polygon(\n", " points, geometry\n", ")\n", "sum_of_points_in_polygons_0_to_31 = points_in_polygon.sum()\n", "sum_of_points_in_polygons_0_to_31.head()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "fd9c4eef", "metadata": {}, "source": [ "cuSpatial includes another join algorithm, `quadtree_point_in_polygon` that uses an indexing \n", "quadtree for faster calculations. `quadtree_point_in_polygon` also supports a number of \n", "polygons limited only by memory constraints." ] }, { "attachments": {}, "cell_type": "markdown", "id": "387565b3-75ae-4789-a950-daffc2d4da01", "metadata": {}, "source": [ "### Quadtree Indexing\n", "\n", "The indexing module is used to create a spatial quadtree. Use \n", "```\n", "cuspatial.quadtree_on_points(\n", " points,\n", " x_min,\n", " x_max,\n", " y_min,\n", " y_max,\n", " scale,\n", " max_depth,\n", " max_size\n", ")\n", "```\n", "to create the quadtree object that is used by the `quadtree_point_in_polygon` \n", "function in the `join` module.\n", "\n", "The function uses a set of points and a user-defined bounding box to build an \n", "indexing quad tree. Be sure to adjust the parameters appropriately, with larger \n", "parameter values for larger datasets.\n", "\n", "`scale`: A scaling function that increases the size of the point space from an \n", "origin defined by `{x_min, y_min}`. This can increase the likelihood of generating \n", "well-separated quads.\n", "\n", "`max_depth`: In order for a quadtree to index points effectively, it must have a\n", "depth that is log-scaled with the size of the number of points. Each level of the \n", "quad tree contains 4 quads. The number of available quads $q$ for indexing is then \n", "equal to $q = 4^{d}$ where $d$ is the `max_depth` parameter. With an input size \n", "of `10m` points and `max_depth = 7`, $\\frac{10^6}{4^7}$ points will be most \n", "efficiently packed into the leaves of the quad tree.\n", "\n", "`max_size`: The maximum number of points allowed in an internal node before it is\n", "split into four leaf notes. As the quadtree is generated, a leaf node containing\n", "usable index points will be created as points are added. If the number of points\n", "in this leaf exceeds `max_size`, the leaf will be subdivided, with four new\n", "leaves added and the original node removed from the set of leaves. This number is\n", "probably optimized in most datasets by making it a significant fraction of the\n", "optimal leaf size computation from above. Consider $10,000,000 / 4^7 / 4 = 153$.\n", "\n", "### [cuspatial.quadtree_on_points](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.quadtree_on_points)" ] }, { "cell_type": "code", "execution_count": 28, "id": "e3a0a9a3-0bdd-4f05-bcb5-7db4b99a44a3", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 1507\n", "1 1726\n", "2 4242\n", "3 7371\n", "4 11341\n", "dtype: uint32\n", " key level is_internal_node length offset\n", "0 0 0 True 4 2\n", "1 1 0 True 2 6\n", "2 0 1 True 4 8\n", "3 1 1 True 4 12\n", "4 2 1 True 2 16\n" ] } ], "source": [ "x_points = (cupy.random.random(10000000) - 0.5) * 360\n", "y_points = (cupy.random.random(10000000) - 0.5) * 180\n", "xy = cudf.DataFrame({\"x\": x_points, \"y\": y_points}).interleave_columns()\n", "points = cuspatial.GeoSeries.from_points_xy(xy)\n", "\n", "scale = 5\n", "max_depth = 7\n", "max_size = 125\n", "point_indices, quadtree = cuspatial.quadtree_on_points(points,\n", " x_points.min(),\n", " x_points.max(),\n", " y_points.min(),\n", " y_points.max(),\n", " scale,\n", " max_depth,\n", " max_size)\n", "print(point_indices.head())\n", "print(quadtree.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "0ab7e64d-1199-498c-b020-b3e7393337a5", "metadata": {}, "source": [ "### Indexed Spatial Joins\n", "\n", "The quadtree spatial index (`point_indices` and `quadtree`) is used by `quadtree_point_in_polygon`\n", "and `quadtree_point_to_nearest_linestring` to accelerate larger spatial joins. \n", "`quadtree_point_in_polygon` depends on a number of intermediate products calculated here using the\n", "following functions.\n", "\n", "### [cuspatial.join_quadtree_and_bounding_boxes](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.join_quadtree_and_bounding_boxes)\n", "### [cuspatial.quadtree_point_in_polygon](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.quadtree_point_in_polygon)" ] }, { "cell_type": "code", "execution_count": 29, "id": "023bd25a-35be-435d-ab0b-ecbd7a47e147", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Empty DataFrame\n", "Columns: [polygon_index, point_index]\n", "Index: []\n" ] } ], "source": [ "polygons = gpu_dataframe['geometry']\n", "\n", "poly_bboxes = cuspatial.polygon_bounding_boxes(\n", " polygons\n", ")\n", "intersections = cuspatial.join_quadtree_and_bounding_boxes(\n", " quadtree,\n", " poly_bboxes,\n", " polygons.polygons.x.min(),\n", " polygons.polygons.x.max(),\n", " polygons.polygons.y.min(),\n", " polygons.polygons.y.max(),\n", " scale,\n", " max_depth\n", ")\n", "polygons_and_points = cuspatial.quadtree_point_in_polygon(\n", " intersections,\n", " quadtree,\n", " point_indices,\n", " points,\n", " polygons\n", ")\n", "print(polygons_and_points.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "aa1552a1-fe4b-4d30-b76f-054a060593ae", "metadata": {}, "source": [ "You can see above that polygon 270 maps to the first 5 points. In order to bring this back to \n", "a specific row of the original dataframe, the individual polygons must be mapped back to their \n", "original MultiPolygon row. This is left an an exercise.\n", "\n", "### [cuspatial.quadtree_point_to_nearest_linestring](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.quadtree_point_to_nearest_linestring)\n", "\n", "`cuspatial.quadtree_point_to_nearest_linestring` can be used to find the Polygon or Linestring \n", "nearest to a set of points from another set of mixed geometries. " ] }, { "cell_type": "code", "execution_count": 30, "id": "784aff8e-c9ed-4a81-aa87-bf301b3b90af", "metadata": { "tags": [] }, "outputs": [], "source": [ "gpu_countries = cuspatial.from_geopandas(countries[countries['geometry'].type == \"Polygon\"])\n", "gpu_cities = cuspatial.from_geopandas(cities[cities['geometry'].type == 'Point'])" ] }, { "cell_type": "code", "execution_count": 31, "id": "fea24c78-cf5c-45c6-b860-338238e61323", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Empty DataFrame\n", "Columns: [point_index, linestring_index, distance]\n", "Index: []\n" ] } ], "source": [ "polygons = gpu_countries['geometry'].polygons\n", "\n", "boundaries = cuspatial.GeoSeries.from_linestrings_xy(\n", " cudf.DataFrame({\"x\": polygons.x, \"y\": polygons.y}).interleave_columns(),\n", " polygons.ring_offset,\n", " cupy.arange(len(polygons.ring_offset))\n", ")\n", "\n", "point_indices, quadtree = cuspatial.quadtree_on_points(gpu_cities['geometry'],\n", " polygons.x.min(),\n", " polygons.x.max(),\n", " polygons.y.min(),\n", " polygons.y.max(),\n", " scale,\n", " max_depth,\n", " max_size)\n", "poly_bboxes = cuspatial.linestring_bounding_boxes(\n", " boundaries,\n", " 2.0\n", ")\n", "intersections = cuspatial.join_quadtree_and_bounding_boxes(\n", " quadtree,\n", " poly_bboxes,\n", " polygons.x.min(),\n", " polygons.x.max(),\n", " polygons.y.min(),\n", " polygons.y.max(),\n", " scale,\n", " max_depth\n", ")\n", "result = cuspatial.quadtree_point_to_nearest_linestring(\n", " intersections,\n", " quadtree,\n", " point_indices,\n", " gpu_cities['geometry'],\n", " boundaries\n", ")\n", "print(result.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "3e4e07f6", "metadata": {}, "source": [ "_Images used with permission from Wikipedia Creative Commons_" ] }, { "cell_type": "code", "execution_count": null, "id": "c3c504d6-a908-455d-9d35-58521f70c5e2", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" }, "vscode": { "interpreter": { "hash": "ef2a625a21f49284d4111fd61c77079c8ec37c2ac9f170a08eb051e93ed3e888" } } }, "nbformat": 4, "nbformat_minor": 5 }