cuSpatial API Reference

GIS

gis.directed_hausdorff_distance(ys, points_per_space)

Compute the directed Hausdorff distances between all pairs of spaces.

Parameters
xs

column of x-coordinates

ys

column of y-coordinates

points_per_space

number of points in each space

Returns
resultcudf.DataFrame

The pairwise directed distance matrix with one row and one column per input space; the value at row i, column j represents the hausdorff distance from space i to space j.

Examples

The directed Hausdorff distance from one space to another is the greatest of all the distances between any point in the first space to the closest point in the second.

Wikipedia

Consider a pair of lines on a grid:

     :
     x
-----xyy---
     :
     :

x0 = (0, 0), x1 = (0, 1)

y0 = (1, 0), y1 = (2, 0)

x0 is the closest point in x to y. The distance from x0 to the farthest point in y is 2.

y0 is the closest point in y to x. The distance from y0 to the farthest point in x is 1.414.

Compute the directed hausdorff distances between a set of spaces

>>> result = cuspatial.directed_hausdorff_distance(
        [0, 1, 0, 0], # xs
        [0, 0, 1, 2], # ys
        [2,    2],    # points_per_space
    )
>>> print(result)
         0         1
    0  0.0  1.414214
    1  2.0  0.000000
gis.haversine_distance(p1_lat, p2_lon, p2_lat)

Compute the haversine distances between an arbitrary list of lon/lat pairs

Parameters
p1_lon

longitude of first set of coords

p1_lat

latitude of first set of coords

p2_lon

longitude of second set of coords

p2_lat

latitude of second set of coords

Returns
resultcudf.Series

The distance between all pairs of lon/lat coordinates

gis.lonlat_to_cartesian(origin_lat, input_lon, input_lat)

Convert lon/lat to x,y coordinates with respect to an origin lon/lat point. Results are scaled relative to the size of the Earth in kilometers.

Parameters
origin_lonnumber

longitude offset (this is subtracted from each input before converting to x,y)

origin_latnumber

latitude offset (this is subtracted from each input before converting to x,y)

input_lonSeries or list

longitude coordinates to convert to x

input_latSeries or list

latitude coordinates to convert to y

Returns
resultcudf.DataFrame
xcudf.Series

x-coordinate of the input relative to the size of the Earth in kilometers.

ycudf.Series

y-coordinate of the input relative to the size of the Earth in kilometers.

gis.point_in_polygon(test_points_y, poly_offsets, poly_ring_offsets, poly_points_x, poly_points_y)

Compute from a set of points and a set of polygons which points fall within which polygons. Note that polygons_(x,y) must be specified as closed polygons: the first and last coordinate of each polygon must be the same.

Parameters
test_points_x

x-coordinate of test points

test_points_y

y-coordinate of test points

poly_offsets

beginning index of the first ring in each polygon

poly_ring_offsets

beginning index of the first point in each ring

poly_points_x

x closed-coordinate of polygon points

poly_points_y

y closed-coordinate of polygon points

Returns
resultcudf.DataFrame

A DataFrame of boolean values indicating whether each point falls within each polygon.

Examples

Test whether 3 points fall within either of two polygons

>>> result = cuspatial.point_in_polygon(
    [0, -8, 6.0],                             # test_points_x
    [0, -8, 6.0],                             # test_points_y
    cudf.Series([0, 1], index=['nyc', 'hudson river']), # poly_offsets
    [0, 3],                                   # ring_offsets
    [-10, 5, 5, -10, 0, 10, 10, 0],           # poly_points_x
    [-10, -10, 5, 5, 0, 0, 10, 10],           # poly_points_y
)
# The result of point_in_polygon is a DataFrame of Boolean
# values indicating whether each point (rows) falls within
# each polygon (columns).
>>> print(result)
            nyc            hudson river
0          True          True
1          True         False
2         False          True
# Point 0: (0, 0) falls in both polygons
# Point 1: (-8, -8) falls in the first polygon
# Point 2: (6.0, 6.0) falls in the second polygon

note input Series x and y will not be index aligned, but computed as sequential arrays.

gis.polygon_bounding_boxes(ring_offsets, xs, ys)

Compute the minimum bounding-boxes for a set of polygons.

Parameters
poly_offsets

Begin indices of the first ring in each polygon (i.e. prefix-sum)

ring_offsets

Begin indices of the first point in each ring (i.e. prefix-sum)

xs

Polygon point x-coordinates

ys

Polygon point y-coordinates

Returns
resultcudf.DataFrame

minimum bounding boxes for each polygon

x_mincudf.Series

the minimum x-coordinate of each bounding box

y_mincudf.Series

the minimum y-coordinate of each bounding box

x_maxcudf.Series

the maximum x-coordinate of each bounding box

y_maxcudf.Series

the maximum y-coordinate of each bounding box

gis.polyline_bounding_boxes(xs, ys, expansion_radius)

Compute the minimum bounding-boxes for a set of polylines.

Parameters
poly_offsets

Begin indices of the first ring in each polyline (i.e. prefix-sum)

xs

Polyline point x-coordinates

ys

Polyline point y-coordinates

expansion_radius

radius of each polyline point

Returns
resultcudf.DataFrame

minimum bounding boxes for each polyline

x_mincudf.Series

the minimum x-coordinate of each bounding box

y_mincudf.Series

the minimum y-coordinate of each bounding box

x_maxcudf.Series

the maximum x-coordinate of each bounding box

y_maxcudf.Series

the maximum y-coordinate of each bounding box

Indexing

indexing.quadtree_on_points(ys, x_min, x_max, y_min, y_max, scale, max_depth, min_size)
Construct a quadtree from a set of points for a given area-of-interest

bounding box.

Parameters
xs

Column of x-coordinates for each point

ys

Column of y-coordinates for each point

x_min

The lower-left x-coordinate of the area of interest bounding box

x_max

The upper-right x-coordinate of the area of interest bounding box

y_min

The lower-left y-coordinate of the area of interest bounding box

y_max

The upper-right y-coordinate of the area of interest bounding box

scale

Scale to apply to each point’s distance from (x_min, y_min)

max_depth

Maximum quadtree depth

min_size

Minimum number of points for a non-leaf quadtree node

Returns
resulttuple (cudf.Series, cudf.DataFrame)
keys_to_pointscudf.Series(dtype=np.int32)

A column of sorted keys to original point indices

quadtreecudf.DataFrame

A complete quadtree for the set of input points

keycudf.Series(dtype=np.int32)

An int32 column of quadrant keys

levelcudf.Series(dtype=np.int8)

An int8 column of quadtree levels

is_quadcudf.Series(dtype=np.bool_)

A boolean column indicating whether the node is a quad or leaf

lengthcudf.Series(dtype=np.int32)

If this is a non-leaf quadrant (i.e. is_quad is True), this column’s value is the number of children in the non-leaf quadrant.

Otherwise this column’s value is the number of points contained in the leaf quadrant.

offsetcudf.Series(dtype=np.int32)

If this is a non-leaf quadrant (i.e. is_quad is True), this column’s value is the position of the non-leaf quadrant’s first child.

Otherwise this column’s value is the position of the leaf quadrant’s first point.

Notes

  • Swaps min_x and max_x if min_x > max_x

  • Swaps min_y and max_y if min_y > max_y

Examples

An example of selecting the min_size and scale based on input:

>>> np.random.seed(0)
>>> points = cudf.DataFrame({
        "x": cudf.Series(np.random.normal(size=120)) * 500,
        "y": cudf.Series(np.random.normal(size=120)) * 500,
    })

>>> max_depth = 3
>>> min_size = 50
>>> min_x, min_y, max_x, max_y = (points["x"].min(),
                                  points["y"].min(),
                                  points["x"].max(),
                                  points["y"].max())
>>> scale = max(max_x - min_x, max_y - min_y) // (1 << max_depth)
>>> print(
        "min_size:   " + str(min_size) + "\n"
        "num_points: " + str(len(points)) + "\n"
        "min_x:      " + str(min_x) + "\n"
        "max_x:      " + str(max_x) + "\n"
        "min_y:      " + str(min_y) + "\n"
        "max_y:      " + str(max_y) + "\n"
        "scale:      " + str(scale) + "\n"
    )
min_size:   50
num_points: 120
min_x:      -1577.4949079170394
max_x:      1435.877311993804
min_y:      -1412.7015761122134
max_y:      1492.572387431971
scale:      301.0

>>> key_to_point, quadtree = cuspatial.quadtree_on_points(
        points["x"],
        points["y"],
        min_x,
        max_x,
        min_y,
        max_y,
        scale, max_depth, min_size
    )

>>> print(quadtree)
    key  level  is_quad  length  offset
0     0      0    False      15       0
1     1      0    False      27      15
2     2      0    False      12      42
3     3      0     True       4       8
4     4      0    False       5     106
5     6      0    False       6     111
6     9      0    False       2     117
7    12      0    False       1     119
8    12      1    False      22      54
9    13      1    False      18      76
10   14      1    False       9      94
11   15      1    False       3     103

>>> print(key_to_point)
0       63
1       20
2       33
3       66
4       19
    ...
115    113
116      3
117     78
118     98
119     24
Length: 120, dtype: int32

Interpolation

class cuspatial.core.interpolate.CubicSpline(t, y, ids=None, size=None, prefixes=None)

Fits each column of the input Series y to a hermetic cubic spline.

cuspatial.CubicSpline supports two usage patterns: The first is identical to scipy.interpolate.CubicSpline:

curve = cuspatial.CubicSpline(t, y)
new_points = curve(np.linspace(t.min, t.max, 50))

This allows API parity with scipy. This isn’t recommended, as scipy host based interpolation performance is likely to exceed GPU performance for a single curve.

However, cuSpatial massively outperforms scipy when many splines are fit simultaneously. Data must be arranged in a SoA format, and the exclusive prefix_sum of the separate curves must also be passed to the function.:

t = cudf.Series(np.repeat(cp.arange(100), 1000)).astype('float32')
y = cudf.Series(np.random.random(100*1000)).astype('float32')
prefix_sum = cudf.Series(cp.arange(1000)*100).astype('int32')
new_samples = cudf.Series(
    np.repeat(np.linspace(0, 100, 1000), 1000)
).astype('float32')

curve = cuspatial.CubicSpline(t, y, prefixes=prefix_sum)
new_points = curve(new_samples, prefix_sum*10)

Methods

__call__(coordinates[, groups])

Interpolates new input values coordinates using the .c DataFrame or map of DataFrames.

CubicSpline.__init__(t, y, ids=None, size=None, prefixes=None)

Computes various error preconditions on the input data, then uses CUDA to compute cubic splines for each set of input coordinates on the GPU in parallel.

Parameters
tcudf.Series

time sample values. Must be monotonically increasing.

ycudf.Series

columns to have curves fit to according to x

ids (Optional)cudf.Series

ids of each spline

size (Optional)cudf.Series

fixed size of each spline

prefixes (Optional)cudf.Series

alternative to size, allows splines of varying length. Not yet fully supported.

Returns
CubicSplinecallable o

o.c contains the coefficients that can be used to compute new points along the spline fitting the original t data. o(n) interpolates the spline coordinates along new input values n.

CubicSpline.__call__(coordinates, groups=None)

Interpolates new input values coordinates using the .c DataFrame or map of DataFrames.

Spatial Querying

spatial_window.points_in_spatial_window(max_x, min_y, max_y, xs, ys)

Return only the subset of coordinates that fall within a rectangular window.

A point (x, y) is inside the query window if and only if min_x < x < max_x AND min_y < y < max_y

The window is specified by minimum and maximum x and y coordinates.

Parameters
min_x

lower x-coordinate of the query window

max_x

upper x-coordinate of the query window

min_y

lower y-coordinate of the query window

max_y

upper y-coordinate of the query window

xs

column of x-coordinates that may fall within the window

ys

column of y-coordinates that may fall within the window

Returns
resultcudf.DataFrame

subset of (x, y) pairs above that fall within the window

Notes

  • Swaps min_x and max_x if min_x > max_x

  • Swaps min_y and max_y if min_y > max_y

Trajectory

trajectory.derive_trajectories(xs, ys, timestamps)

Derive trajectories from object ids, points, and timestamps.

Parameters
object_ids

column of object (e.g., vehicle) ids

xs

column of x-coordinates (in kilometers)

ys

column of y-coordinates (in kilometers)

timestamps

column of timestamps in any resolution

Returns
resulttuple (objects, traj_offsets)
objectscudf.DataFrame

object_ids, xs, ys, and timestamps sorted by (object_id, timestamp), used by trajectory_bounding_boxes and trajectory_distances_and_speeds

traj_offsetscudf.Series

offsets of discovered trajectories

Examples

Compute sorted objects and discovered trajectories

>>> objects, traj_offsets = cuspatial.derive_trajectories(
        [0, 1, 2, 3],  # object_id
        [0, 0, 1, 1],  # x
        [0, 0, 1, 1],  # y
        [0, 10, 0, 10] # timestamp
    )
>>> print(traj_offsets)
    0  0
    1  2
>>> print(objects)
       object_id       x       y  timestamp
    0          0       1       0          0
    1          0       0       0         10
    2          1       3       1          0
    3          1       2       1         10
trajectory.trajectory_bounding_boxes(object_ids, xs, ys)

Compute the bounding boxes of sets of trajectories.

Parameters
num_trajectories

number of trajectories (unique object ids)

object_ids

column of object (e.g., vehicle) ids

xs

column of x-coordinates (in kilometers)

ys

column of y-coordinates (in kilometers)

Returns
resultcudf.DataFrame

minimum bounding boxes (in kilometers) for each trajectory

x_mincudf.Series

the minimum x-coordinate of each bounding box

y_mincudf.Series

the minimum y-coordinate of each bounding box

x_maxcudf.Series

the maximum x-coordinate of each bounding box

y_maxcudf.Series

the maximum y-coordinate of each bounding box

Examples

Compute the minimum bounding boxes of derived trajectories

>>> objects, traj_offsets = trajectory.derive_trajectories(
        [0, 0, 1, 1],  # object_id
        [0, 1, 2, 3],  # x
        [0, 0, 1, 1],  # y
        [0, 10, 0, 10] # timestamp
    )
>>> traj_bounding_boxes = cuspatial.trajectory_bounding_boxes(
        len(traj_offsets),
        objects['object_id'],
        objects['x'],
        objects['y']
    )
>>> print(traj_bounding_boxes)
    x_min   y_min   x_max   y_max
0     0.0     0.0     2.0     2.0
1     1.0     1.0     3.0     3.0
trajectory.trajectory_distances_and_speeds(object_ids, xs, ys, timestamps)

Compute the distance traveled and speed of sets of trajectories

Parameters
num_trajectories

number of trajectories (unique object ids)

object_ids

column of object (e.g., vehicle) ids

xs

column of x-coordinates (in kilometers)

ys

column of y-coordinates (in kilometers)

timestamps

column of timestamps in any resolution

Returns
resultcudf.DataFrame
meterscudf.Series

trajectory distance (in kilometers)

speedcudf.Series

trajectory speed (in meters/second)

Examples

Compute the distances and speeds of derived trajectories

>>> objects, traj_offsets = cuspatial.derive_trajectories(...)
>>> dists_and_speeds = cuspatial.trajectory_distances_and_speeds(
        len(traj_offsets)
        objects['object_id'],
        objects['x'],
        objects['y'],
        objects['timestamp']
    )
>>> print(dists_and_speeds)
                   distance          speed
    trajectory_id
    0                1000.0  100000.000000
    1                1000.0  111111.109375

IO

shapefile.read_polygon_shapefile()

Reads polygon geometry from an ESRI shapefile into GPU memory.

Parameters
filenamestr, pathlike

ESRI Shapefile file path (usually ends in .shp)

Returns
resulttuple (cudf.Series, cudf.Series, cudf.DataFrame)
poly_offsetscudf.Series(dtype=np.int32)

Offsets of the first ring in each polygon

ring_offsetscudf.Series(dtype=np.int32)

Offsets of the first point in each ring

pointscudf.DataFrame
DataFrame of all points in the shapefile
xcudf.Series(dtype=np.float64)

x-components of each polygon’s points

ycudf.Series(dtype=np.float64)

y-components of each polygon’s points