Spatial#

Functions that operate on spatial data.

Spatial Indexing Functions#

cuspatial.quadtree_on_points(points: GeoSeries, x_min, x_max, y_min, y_max, scale, max_depth, max_size)#

Construct a quadtree from a set of points for a given area-of-interest bounding box.

Parameters
points

Series of points.

x_min

The lower-left x-coordinate of the area of interest bounding box.

x_max

The upper-right x-coordinate of the area of interest bounding box.

y_min

The lower-left y-coordinate of the area of interest bounding box.

y_max

The upper-right y-coordinate of the area of interest bounding box.

scale

Scale to apply to each point’s distance from (x_min, y_min).

max_depth

Maximum quadtree depth.

max_size

Maximum number of points allowed in a node before it’s split into 4 leaf nodes.

Returns
resulttuple (cudf.Series, cudf.DataFrame)
keys_to_pointscudf.Series(dtype=np.int32)

A column of sorted keys to original point indices

quadtreecudf.DataFrame

A complete quadtree for the set of input points

keycudf.Series(dtype=np.int32)

An int32 column of quadrant keys

levelcudf.Series(dtype=np.int8)

An int8 column of quadtree levels

is_internal_nodecudf.Series(dtype=np.bool_)

A boolean column indicating whether the node is a quad or leaf

lengthcudf.Series(dtype=np.int32)

If this is a non-leaf quadrant (i.e. is_internal_node is True), this column’s value is the number of children in the non-leaf quadrant.

Otherwise this column’s value is the number of points contained in the leaf quadrant.

offsetcudf.Series(dtype=np.int32)

If this is a non-leaf quadrant (i.e. is_internal_node is True), this column’s value is the position of the non-leaf quadrant’s first child.

Otherwise this column’s value is the position of the leaf quadrant’s first point.

Notes

  • Swaps min_x and max_x if min_x > max_x

  • Swaps min_y and max_y if min_y > max_y

  • 2D coordinates are converted into a 1D Morton code by dividing each x/y by the scale: ((x - min_x) / scale and (y - min_y) / scale).

  • max_depth should be less than 16, since Morton codes are represented as uint32_t. The eventual number of levels may be less than max_depth if the number of points is small or max_size is large.

  • All intermediate quadtree nodes will have fewer than max_size number of points. Leaf nodes are permitted (but not guaranteed) to have >= max_size number of points.

Examples

An example of selecting the max_size and scale based on input:

>>> np.random.seed(0)
>>> points = cudf.DataFrame({
        "x": cudf.Series(np.random.normal(size=120)) * 500,
        "y": cudf.Series(np.random.normal(size=120)) * 500,
    })

>>> max_depth = 3
>>> max_size = 50
>>> min_x, min_y, max_x, max_y = (points["x"].min(),
                                  points["y"].min(),
                                  points["x"].max(),
                                  points["y"].max())
>>> scale = max(max_x - min_x, max_y - min_y) // (1 << max_depth)
>>> print(
        "max_size:   " + str(max_size) + "\n"
        "num_points: " + str(len(points)) + "\n"
        "min_x:      " + str(min_x) + "\n"
        "max_x:      " + str(max_x) + "\n"
        "min_y:      " + str(min_y) + "\n"
        "max_y:      " + str(max_y) + "\n"
        "scale:      " + str(scale) + "\n"
    )
max_size:   50
num_points: 120
min_x:      -1577.4949079170394
max_x:      1435.877311993804
min_y:      -1412.7015761122134
max_y:      1492.572387431971
scale:      301.0

>>> key_to_point, quadtree = cuspatial.quadtree_on_points(
        points["x"],
        points["y"],
        min_x,
        max_x,
        min_y,
        max_y,
        scale, max_depth, max_size
    )

>>> print(quadtree)
    key  level  is_internal_node  length  offset
0     0      0             False      15       0
1     1      0             False      27      15
2     2      0             False      12      42
3     3      0              True       4       8
4     4      0             False       5     106
5     6      0             False       6     111
6     9      0             False       2     117
7    12      0             False       1     119
8    12      1             False      22      54
9    13      1             False      18      76
10   14      1             False       9      94
11   15      1             False       3     103

>>> print(key_to_point)
0       63
1       20
2       33
3       66
4       19
    ...
115    113
116      3
117     78
118     98
119     24
Length: 120, dtype: int32

Spatial Join Functions#

cuspatial.point_in_polygon(points: GeoSeries, polygons: GeoSeries)#

Compute from a set of points and a set of polygons which points fall within which polygons. Note that polygons_(x,y) must be specified as closed polygons: the first and last coordinate of each polygon must be the same.

Parameters
pointsGeoSeries

A Series of points to test

polygons: GeoSeries

A Series of polygons to test

Returns
resultcudf.DataFrame

A DataFrame of boolean values indicating whether each point falls within each polygon.

Examples

Test whether 3 points fall within either of two polygons

>>> result = cuspatial.point_in_polygon(
    GeoSeries([Point(0, 0), Point(-8, -8), Point(6, 6)]),
    GeoSeries([
        Polygon([(-10, -10), (5, -10), (5, 5), (-10, 5), (-10, -10)]),
        Polygon([(0, 0), (10, 0), (10, 10), (0, 10), (0, 0)])
    ], index=['nyc', 'hudson river'])
)
# The result of point_in_polygon is a DataFrame of Boolean
# values indicating whether each point (rows) falls within
# each polygon (columns).
>>> print(result)
            nyc            hudson river
0          True          True
1          True         False
2         False          True
# Point 0: (0, 0) falls in both polygons
# Point 1: (-8, -8) falls in the first polygon
# Point 2: (6.0, 6.0) falls in the second polygon
cuspatial.quadtree_point_in_polygon(poly_quad_pairs, quadtree, point_indices, points: GeoSeries, polygons: GeoSeries)#

Test whether the specified points are inside any of the specified polygons.

Uses the table of (polygon, quadrant) pairs returned by cuspatial.join_quadtree_and_bounding_boxes to ensure only the points in the same quadrant as each polygon are tested for intersection.

This pre-filtering can dramatically reduce number of points tested per polygon, enabling faster intersection-testing at the expense of extra memory allocated to store the quadtree and sorted point_indices.

Parameters
poly_quad_pairs: cudf.DataFrame

Table of (polygon, quadrant) index pairs returned by cuspatial.join_quadtree_and_bounding_boxes.

quadtreecudf.DataFrame

A complete quadtree for a given area-of-interest bounding box.

point_indicescudf.Series

Sorted point indices returned by cuspatial.quadtree_on_points

points: GeoSeries

Points used to build the quadtree

polygons: GeoSeries

Polygons to test against

Returns
resultcudf.DataFrame

Indices for each intersecting point and polygon pair.

polygon_indexcudf.Series

Index of containing polygon.

point_indexcudf.Series

Index of contained point. This index refers to point_indices, so it is an index to an index.

cuspatial.quadtree_point_to_nearest_linestring(linestring_quad_pairs, quadtree, point_indices, points: GeoSeries, linestrings: GeoSeries)#

Finds the nearest linestring to each point in a quadrant, and computes the distances between each point and linestring.

Uses the table of (linestring, quadrant) pairs returned by cuspatial.join_quadtree_and_bounding_boxes to ensure distances are computed only for the points in the same quadrant as each linestring.

Parameters
linestring_quad_pairs: cudf.DataFrame

Table of (linestring, quadrant) index pairs returned by cuspatial.join_quadtree_and_bounding_boxes.

quadtreecudf.DataFrame

A complete quadtree for a given area-of-interest bounding box.

point_indicescudf.Series

Sorted point indices returned by cuspatial.quadtree_on_points

points: GeoSeries

Points to find nearest linestring for

linestrings: GeoSeries

Linestrings to test for

Returns
resultcudf.DataFrame

Indices for each point and its nearest linestring, and the distance between the two.

point_indexcudf.Series

Index of point. This index refers to point_indices, so it is an index to an index.

linestring_indexcudf.Series

Index of the nearest linestring to the point.

distancecudf.Series

Distance between point and its nearest linestring.

cuspatial.join_quadtree_and_bounding_boxes(quadtree, bounding_boxes, x_min, x_max, y_min, y_max, scale, max_depth)#

Search a quadtree for polygon or linestring bounding box intersections.

Parameters
quadtreecudf.DataFrame

A complete quadtree for a given area-of-interest bounding box.

bounding_boxescudf.DataFrame

Minimum bounding boxes for a set of polygons or linestrings

x_min

The lower-left x-coordinate of the area of interest bounding box

x_max

The upper-right x-coordinate of the area of interest bounding box

min_y

The lower-left y-coordinate of the area of interest bounding box

max_y

The upper-right y-coordinate of the area of interest bounding box

scale

Scale to apply to each point’s distance from (x_min, y_min)

max_depth

Maximum quadtree depth at which to stop testing for intersections

Returns
resultcudf.DataFrame

Indices for each intersecting bounding box and leaf quadrant.

bbox_offsetcudf.Series

Indices for each bbox that intersects with the quadtree.

quad_offsetcudf.Series

Indices for each leaf quadrant intersecting with a poly bbox.

Notes

  • Swaps min_x and max_x if min_x > max_x

  • Swaps min_y and max_y if min_y > max_y

Measurement Functions#

cuspatial.directed_hausdorff_distance(multipoints: GeoSeries)#

Compute the directed Hausdorff distances between all pairs of spaces.

Parameters
multipoints: GeoSeries

A column of multipoint, where each multipoint indicates an input space to compute its hausdorff distance to the rest of input spaces.

Returns
resultcudf.DataFrame

result[i, j] indicates the hausdorff distance between multipoints[i] and multipoint[j].

Examples

The directed Hausdorff distance from one space to another is the greatest of all the distances between any point in the first space to the closest point in the second.

Wikipedia

Consider a pair of lines on a grid:

     :
     x
-----xyy---
     :
     :

x0 = (0, 0), x1 = (0, 1)

y0 = (1, 0), y1 = (2, 0)

x0 is the closest point in x to y. The distance from x0 to the farthest point in y is 2.

y0 is the closest point in y to x. The distance from y0 to the farthest point in x is 1.414.

Compute the directed hausdorff distances between a set of spaces

>>> pts = cuspatial.GeoSeries([
...     MultiPoint([(0, 0), (1, 0)]),
...     MultiPoint([(0, 1), (0, 2)])
... ])
>>> cuspatial.directed_hausdorff_distance(pts)
    0         1
0  0.0  1.414214
1  2.0  0.000000
cuspatial.haversine_distance(p1: GeoSeries, p2: GeoSeries)#

Compute the haversine distances in kilometers between an arbitrary list of lon/lat pairs

Parameters
p1: GeoSeries

Series of points as floats

p2: GeoSeries

Series of points as floats

Returns
resultcudf.Series

The distance between pairs of points between p1 and p2

>>> import cudf
    ..
>>> import cuspatial
    ..
>>> a = {"latitude":[0.0,0.0,1.0,1.0],
    ..
… “longitude”: [0.0,1.0,0.0,1.0]}
>>> df = cudf.DataFrame(data=a)
    ..
>>> # Create cuSpatial GeoSeries from cuDF Dataframe
    ..
>>> gs = cuspatial.GeoSeries.from_points_xy(
    ..
… df[[‘longitude’, ‘latitude’]].interleave_columns()
… )
>>> # Create Comparator cuSpatial GeoSeries from a comparator point
    ..
>>> df['compare_lat'] = 2.0 # this will broadcast the value to all rows
    ..
>>> df['compare_lng'] = 2.0
    ..
>>> cmp_gs = cuspatial.GeoSeries.from_points_xy(
    ..
… df[[‘compare_lat’, ‘compare_lng’]].interleave_columns()
… )
>>> # Calculate Haversine Distance of cuDF dataframe to comparator point
    ..
>>> df['compare_dist'] = cuspatial.haversine_distance(gs, cmp_gs)
    ..
>>> df.head()
    latitude  longitude  compare_lat  compare_lng  compare_dist
0 0.0 0.0 2.0 2.0 314.474805
1 0.0 1.0 2.0 2.0 248.629315
2 1.0 0.0 2.0 2.0 248.568719
3 1.0 1.0 2.0 2.0 157.225432
cuspatial.pairwise_point_distance(points1: GeoSeries, points2: GeoSeries)#

Compute distance between (multi)points-(multi)points pairs

Currently points1 and points2 must contain either only points or multipoints. Mixing points and multipoints in the same series is unsupported.

Parameters
points1GeoSeries

A GeoSeries of (multi)points

points2GeoSeries

A GeoSeries of (multi)points

Returns
distancecudf.Series

the distance between each pair of (multi)points

Examples

>>> from shapely.geometry import Point, MultiPoint
>>> p1 = cuspatial.GeoSeries([
...     MultiPoint([(0.0, 0.0), (1.0, 0.0)]),
...     MultiPoint([(0.0, 1.0), (1.0, 0.0)])
... ])
>>> p2 = cuspatial.GeoSeries([
...     Point(2.0, 2.0), Point(0.0, 0.5)
... ])
>>> cuspatial.pairwise_point_distance(p1, p2)
0    2.236068
1    0.500000
dtype: float64
cuspatial.pairwise_linestring_distance(multilinestrings1: GeoSeries, multilinestrings2: GeoSeries)#

Compute distance between (multi)linestring-(multi)linestring pairs

The shortest distance between two linestrings is defined as the shortest distance between all pairs of segments of the two linestrings. If any of the segments intersect, the distance is 0.

Parameters
multilinestrings1GeoSeries

A GeoSeries of (multi)linestrings

multilinestrings2GeoSeries

A GeoSeries of (multi)linestrings

Returns
distancecudf.Series

the distance between each pair of linestrings

Examples

>>> from shapely.geometry import LineString, MultiLineString
>>> ls1 = cuspatial.GeoSeries([
...     LineString([(0, 0), (1, 1)]),
...     LineString([(1, 0), (2, 1)])
... ])
>>> ls2 = cuspatial.GeoSeries([
...     MultiLineString([
...         LineString([(-1, 0), (-2, -1)]),
...         LineString([(-2, -1), (-3, -2)])
...     ]),
...     MultiLineString([
...         LineString([(0, -1), (0, -2), (0, -3)]),
...         LineString([(0, -3), (-1, -3), (-2, -3)])
...     ])
... ])
>>> cuspatial.pairwise_linestring_distance(ls1, ls2)
0    1.000000
1    1.414214
dtype: float64
cuspatial.pairwise_point_linestring_distance(points: GeoSeries, linestrings: GeoSeries)#

Compute distance between (multi)points-(multi)linestrings pairs

The distance between a (multi)point and a (multi)linestring is defined as the shortest distance between every point in the multipoint and every line segment in the multilinestring.

This algorithm computes distance pairwise. The ith row in the result is the distance between the ith (multi)point in points and the ith (multi)linestring in linestrings.

Parameters
pointsGeoSeries

The (multi)points to compute the distance from.

linestringsGeoSeries

The (multi)linestrings to compute the distance from.

Returns
distancecudf.Series

Notes

The input GeoSeries must contain a single type geometry. For example, points series cannot contain both points and polygons. Currently, it is unsupported that points contains both points and multipoints.

Examples

Compute distances between point array to linestring arrays

>>> from shapely.geometry import (
...     Point, MultiPoint, LineString, MultiLineString
... )
>>> import geopandas as gpd, cuspatial
>>> pts = cuspatial.from_geopandas(gpd.GeoSeries([
...     Point(0.0, 0.0), Point(1.0, 1.0), Point(2.0, 2.0)
... ]))
>>> mlines = cuspatial.from_geopandas(gpd.GeoSeries(
...     [
...        LineString([Point(-1.0, 0.0),
...                    Point(-0.5, -0.5),
...                    Point(-1.0, -0.5),
...                    Point(-0.5, -1.0)]),
...        LineString([Point(8.0, 10.0),
...                    Point(11.21, 9.48),
...                    Point(7.1, 12.5)]),
...        LineString([Point(1.0, 0.0), Point(0.0, 1.0)]),
...     ]))
>>> cuspatial.pairwise_point_linestring_distance(pts, mlines)
0     0.707107
1    11.401754
2     2.121320
dtype: float64

Compute distances between multipoint to multilinestring arrays >>> # 3 pairs of multi points containing 3 points each >>> ptsdata = [ … [[9, 7], [0, 6], [7, 2]], … [[5, 8], [5, 7], [6, 0]], … [[8, 8], [6, 7], [4, 1]], … ] >>> # 3 pairs of multi linestrings containing 2 linestrings each >>> linesdata = [ … [ … [[86, 47], [31, 17], [84, 16], [14, 63]], … [[14, 36], [90, 73], [72, 66], [0, 5]], … ], … [ … [[36, 90], [29, 31], [91, 70], [25, 78]], … [[61, 64], [89, 20], [94, 46], [37, 44]], … ], … [ … [[60, 76], [29, 60], [53, 87], [8, 18]], … [[0, 16], [79, 14], [3, 6], [98, 89]], … ], … ] >>> pts = cuspatial.from_geopandas( … gpd.GeoSeries(map(MultiPoint, ptsdata)) … ) >>> lines = cuspatial.from_geopandas( … gpd.GeoSeries(map(MultiLineString, linesdata)) … ) >>> cuspatial.pairwise_point_linestring_distance(pts, lines) 0 0.762984 1 33.241540 2 0.680451 dtype: float64

Nearest Points Function#

cuspatial.pairwise_point_linestring_nearest_points(points: GeoSeries, linestrings: GeoSeries) GeoDataFrame#

Returns the nearest points between two GeoSeries of points and linestrings.

Multipoints and Multilinestrings are also supported. With restriction that the points series must contain either only points or only multipoints.

Parameters
pointsGeoSeries

A GeoSeries of points. Currently either only a series of points or a series of multipoints is supported.

linestringsGeoSeries

A GeoSeries of linestrings or multilinestrings.

Returns
GeoDataFrame

A GeoDataFrame with four columns.

  • “point_geometry_id” contains index of the nearest point in the row. If points consists of single-points, it is always 0.

  • “linestring_geometry_id” contains the index of the linestring in the multilinestring that contains the nearest point.

  • “segment_id” contains the index of the segment in the linestring that contains the nearest point.

  • “geometry” contains the points of the nearest point on the linestring.

Bounding Boxes#

cuspatial.polygon_bounding_boxes(polygons: GeoSeries)#

Compute the minimum bounding-boxes for a set of polygons.

Parameters
polygons: GeoSeries

A series of polygons

Returns
resultcudf.DataFrame

minimum bounding boxes for each polygon

minxcudf.Series

the minimum x-coordinate of each bounding box

minycudf.Series

the minimum y-coordinate of each bounding box

maxxcudf.Series

the maximum x-coordinate of each bounding box

maxycudf.Series

the maximum y-coordinate of each bounding box

Notes

Has no notion of multipolygons. If a multipolygon is passed, the bounding boxes for each polygon will be computed and returned. The user is responsible for handling the multipolygon case.

cuspatial.linestring_bounding_boxes(linestrings: GeoSeries, expansion_radius: float)#

Compute the minimum bounding boxes for a set of linestrings.

Parameters
linestrings: GeoSeries

A series of linestrings

expansion_radius

radius of each linestring point

Returns
resultcudf.DataFrame

minimum bounding boxes for each linestring

minxcudf.Series

the minimum x-coordinate of each bounding box

minycudf.Series

the minimum y-coordinate of each bounding box

maxxcudf.Series

the maximum x-coordinate of each bounding box

maxycudf.Series

the maximum y-coordinate of each bounding box

Projection Functions#

cuspatial.sinusoidal_projection(origin_lon, origin_lat, lonlat: GeoSeries)#

Sinusoidal projection of longitude/latitude relative to origin to Cartesian (x/y) coordinates in km.

Can be used to approximately convert longitude/latitude coordinates to Cartesian coordinates given that all points are near the origin. Error increases with distance from the origin. Results are scaled relative to the size of the Earth in kilometers. See https://en.wikipedia.org/wiki/Sinusoidal_projection for more detail.

Parameters
origin_lonnumber

longitude offset (this is subtracted from each input before converting to x,y)

origin_latnumber

latitude offset (this is subtracted from each input before converting to x,y)

lonlat: GeoSeries

A GeoSeries of Points that contains the longitude and latitude to transform

Returns
resultGeoSeries

A GeoSeries that contains the transformed coordinates.

Spatial Filtering Functions#

cuspatial.points_in_spatial_window(points: GeoSeries, min_x, max_x, min_y, max_y)#

Return only the subset of coordinates that fall within a rectangular window.

A point (x, y) is inside the query window if and only if min_x < x < max_x AND min_y < y < max_y

The window is specified by minimum and maximum x and y coordinates.

Parameters
points: GeoSeries

A geoseries of points

min_x: float

lower x-coordinate of the query window

max_x: float

upper x-coordinate of the query window

min_y: float

lower y-coordinate of the query window

max_y: float

upper y-coordinate of the query window

Returns
resultGeoSeries

subset of points above that fall within the window

Notes

  • Swaps min_x and max_x if min_x > max_x

  • Swaps min_y and max_y if min_y > max_y