GeoPandas Compatibility¶
We support any geometry format supported by GeoPandas. Load geometry information from a GeoPandas.GeoSeries or GeoPandas.GeoDataFrame.
>>> gpdf = geopandas.read_file('arbitrary.txt')
cugpdf = cuspatial.from_geopandas(gpdf)
or
>>> cugpdf = cuspatial.GeoDataFrame(gpdf)
- class cuspatial.GeoDataFrame(data: Optional[geopandas.geodataframe.GeoDataFrame] = None)¶
Bases:
cudf.core.dataframe.DataFrame
A GPU GeoDataFrame object.
Methods
groupby
(*args, **kwargs)Group DataFrame using a mapper or by a Series of columns.
to_geopandas
([nullable])Returns a new GeoPandas GeoDataFrame object from the coordinates in the cuspatial GeoDataFrame.
to_pandas
([nullable])Calls self.to_geopandas, converting GeoSeries columns into GeoPandas columns and cudf.Series columns into pandas.Series columns, and returning a pandas.DataFrame.
- groupby(*args, **kwargs)¶
Group DataFrame using a mapper or by a Series of columns.
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
- Parameters
- bymapping, function, label, or list of labels
Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If a cupy array is passed, the values are used as-is determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.
- levelint, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels.
- as_indexbool, default True
For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output.
- sortbool, default False
Sort result by group key. Differ from Pandas, cudf defaults to
False
for better performance. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.- dropnabool, optional
If True (default), do not include the “null” group.
- Returns
- DataFrameGroupBy
Returns a groupby object that contains information about the groups.
Examples
>>> import cudf >>> import pandas as pd >>> df = cudf.DataFrame({'Animal': ['Falcon', 'Falcon', ... 'Parrot', 'Parrot'], ... 'Max Speed': [380., 370., 24., 26.]}) >>> df Animal Max Speed 0 Falcon 380.0 1 Falcon 370.0 2 Parrot 24.0 3 Parrot 26.0 >>> df.groupby(['Animal']).mean() Max Speed Animal Falcon 375.0 Parrot 25.0
>>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'], ... ['Captive', 'Wild', 'Captive', 'Wild']] >>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type')) >>> df = cudf.DataFrame({'Max Speed': [390., 350., 30., 20.]}, index=index) >>> df Max Speed Animal Type Falcon Captive 390.0 Wild 350.0 Parrot Captive 30.0 Wild 20.0 >>> df.groupby(level=0).mean() Max Speed Animal Falcon 370.0 Parrot 25.0 >>> df.groupby(level="Type").mean() Max Speed Type Wild 185.0 Captive 210.0
- to_geopandas(nullable=False)¶
Returns a new GeoPandas GeoDataFrame object from the coordinates in the cuspatial GeoDataFrame.
- Parameters
- nullable: matches the cudf `to_pandas` signature, not yet supported.
- to_pandas(nullable=False)¶
Calls self.to_geopandas, converting GeoSeries columns into GeoPandas columns and cudf.Series columns into pandas.Series columns, and returning a pandas.DataFrame.
- Parameters
- nullable: matches the cudf `to_pandas` signature, not yet supported.
- class cuspatial.GeoSeries(data: geopandas.geoseries.GeoSeries, index: Optional[Union[cudf.core.index.Index, pandas.core.indexes.base.Index]] = None, dtype=None, name=None, nan_as_null=True)¶
Bases:
cudf.core.series.Series
cuspatial.GeoSeries enables GPU-backed storage and computation of shapely-like objects. Our goal is to give feature parity with GeoPandas. At this time, only from_geopandas and to_geopandas are directly supported. cuspatial GIS, indexing, and trajectory functions depend on the arrays stored in the GeoArrowBuffers object, accessible with the points, multipoints, lines, and polygons accessors.
>>> cuseries.points xy: 0 -1.0 1 0.0 dtype: float64
- Attributes
lines
Access the LineArray of the underlying GeoArrowBuffers.
multipoints
Access the MultiPointArray of the underlying GeoArrowBuffers.
points
Access the PointsArray of the underlying GeoArrowBuffers.
polygons
Access the PolygonArray of the underlying GeoArrowBuffers.
Methods
to_geopandas
([nullable])Returns a new GeoPandas GeoSeries object from the coordinates in the cuspatial GeoSeries.
Treats to_pandas and to_geopandas as the same call, which improves compatibility with pandas.
- property lines¶
Access the LineArray of the underlying GeoArrowBuffers.
- property multipoints¶
Access the MultiPointArray of the underlying GeoArrowBuffers.
- property points¶
Access the PointsArray of the underlying GeoArrowBuffers.
- property polygons¶
Access the PolygonArray of the underlying GeoArrowBuffers.
- to_geopandas(nullable=False)¶
Returns a new GeoPandas GeoSeries object from the coordinates in the cuspatial GeoSeries.
- to_pandas()¶
Treats to_pandas and to_geopandas as the same call, which improves compatibility with pandas.
- class cuspatial.geometry.geocolumn.GeoColumn(data: cuspatial.geometry.geoarrowbuffers.GeoArrowBuffers, meta: Optional[cuspatial.geometry.geocolumn.GeoMeta] = None, shuffle_order: Optional[cudf.core.index.Index] = None)¶
Bases:
cudf.core.column.numerical.NumericalColumn
- Parameters
- dataA GeoArrowBuffers object
- metaA GeoMeta object (optional)
Notes
The GeoColumn class subclasses NumericalColumn. Combined with _copy_type_metadata, this assures support for existing cudf algorithms.
- Attributes
Methods
copy
([deep])Create a copy of all of the GPU-backed data structures in this GeoColumn.
to_host
- copy(deep=True)¶
Create a copy of all of the GPU-backed data structures in this GeoColumn.
- property iloc¶
Return the i-th row of the GeoSeries.
- property loc¶
Not currently supported.
- class cuspatial.GeoArrowBuffers(data: Union[dict, cuspatial.geometry.geoarrowbuffers.T], data_locale: object = <module 'cudf' from '/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/__init__.py'>)¶
Bases:
object
A GPU GeoArrowBuffers object.
- Parameters
- dataA dict or a GeoArrowBuffers object.
- The GeoArrow format specifies a tabular data format for geometry
- information. Supported types include `Point`, `MultiPoint`, `LineString`,
- `MultiLineString`, `Polygon`, and `MultiPolygon`. In order to store
- these coordinate types in a strictly tabular fashion, columns are
- created for Points, MultiPoints, LineStrings, and Polygons.
- MultiLines and MultiPolygons are stored in the same data structure
- as LineStrings and Polygons. GeoArrowBuffers are constructed from a dict
- of host buffers with accepted keys:
- * points_xy
- * points_z
- * multipoints_xy
- * multipoints_z
- * multipoints_offsets
- * lines_xy
- * lines_z
- * lines_offsets
- * mlines
- * polygons_xy
- * polygons_z
- * polygons_polygons
- * polygons_rings
- * mpolygons
- There are no correlations in length between any of the above columns.
- Accepted host buffer object types include python list and any type that
- implements numpy’s `__array__interface__` protocol.
- GeoArrow Format
- GeoArrow format packs complex geometry types into 14 single-column Arrow
- tables. This description is included for better understanding GeoArrow
- format. Interacting with the GeoArrowBuffers is only required if you want
- to convert cudf data to GeoPandas objects without starting from GeoPandas.
- The points geometry is the simplest: N points are stored in a length 2*N
- buffer with interleaved x,y coordinates. An optional z buffer of length N
- can be used.
- The multipoints geometry is the second simplest - identical to points,
- with the addition of a multipoints_offsets buffer. The offsets buffer
- stores N+1 indexes. The first multipoint is specified by 0, which is always
- stored in offsets[0], and offsets[1], which is the length in points of
- the first multipoint geometry. Subsequent multipoints are the prefix-sum of
- the lengths of previous multipoints.
- Consider::
- buffers = GeoArrowBuffers({
- “multipoints_xy”:
[0, 0, 0, 1, 0, 2, 1, 0, 1, 1, 1, 2, 2, 0, 2, 1, 2, 2],
- “multipoints_offsets”:
[0, 6, 12, 18]
})
- which encodes the following GeoPandas Series::
- series = geopandas.Series([
MultiPoint((0, 0), (0, 1), (0, 2)), MultiPoint((1, 0), (1, 1), (1, 2)), MultiPoint((2, 0), (2, 1), (2, 2)),
])
- LineString geometry is more complicated than multipoints because the
- format allows for the use of LineStrings and MultiLineStrings in the same
- buffer, via the mlines key::
- buffers = GeoArrowBuffers({
- “lines_xy”:
- [0, 0, 0, 1, 0, 2, 1, 0, 1, 1, 1, 2, 2, 0, 2, 1, 2, 2, 3, 0,
3, 1, 3, 2, 4, 0, 4, 1, 4, 2],
- “lines_offsets”:
[0, 6, 12, 18, 24, 30],
- “mlines”:
[1, 3]
})
- Which encodes a GeoPandas Series::
- series = geopandas.Series([
LineString((0, 0), (0, 1), (0, 2)), MultiLineString([(1, 0), (1, 1), (1, 2)],
[(2, 0), (2, 1), (2, 2)],
) LineString((3, 0), (3, 1), (3, 2)), LineString((4, 0), (4, 1), (4, 2)),
])
- Polygon geometry includes `mpolygons` for MultiPolygons similar to the
- LineString geometry. Polygons are encoded using the same format as
- Shapefiles, with left-wound external rings and right-wound internal rings.
- An exact example of `GeoArrowBuffers` to `geopandas.Series` is left to the
- reader as an exercise. Convert any GeoPandas `Series` or `DataFrame` with
- `cuspatial.from_geopandas(geopandas_object)`.
Notes
Legacy cuspatial algorithms depend on separated x and y columns. Access them with the .x and .y properties.
Examples
GeoArrowBuffers accept a dict as argument. Valid keys are in the bullet list above. Valid values are any datatype that implements numpy’s __array_interface__. Any or all of the four basic geometry types is supported as argument:
buffers = GeoArrowBuffers({ "points_xy": [0, 0, 0, 1, 0, 2, 1, 0, 1, 1, 1, 2, 2, 0, 2, 1, 2, 2], "multipoints_xy": [0, 0, 0, 1, 0, 2, 1, 0, 1, 1, 1, 2, 2, 0, 2, 1, 2, 2], "multipoints_offsets": [0, 6, 12, 18] "lines_xy": [0, 0, 0, 1, 0, 2, 1, 0, 1, 1, 1, 2, 2, 0, 2, 1, 2, 2, 3, 0, 3, 1, 3, 2, 4, 0, 4, 1, 4, 2], "lines_offsets": [0, 6, 12, 18, 24, 30], "mlines": [1, 3] "polygons_xy": [0, 0, 0, 1, 0, 2, 1, 0, 1, 1, 1, 2, 2, 0, 2, 1, 2, 2, 3, 0, 3, 1, 3, 2, 4, 0, 4, 1, 4, 2], "polygons_polygons": [0, 1, 2], "polygons_rings": [0, 1, 2], "mpolygons": [1, 3], })
or another GeoArrowBuffers:
buffers2 = GeoArrowBuffers(buffers)
- Attributes
lines
Contains the coordinates column, an offsets column, and a mlines column.
multipoints
Similar to the Points column with the addition of an offsets column.
points
A simple numeric column.
polygons
Contains the coordinates column, a rings column specifying the beginning and end of every polygon, a polygons column specifying the beginning, or exterior, ring of each polygon and the end ring.
Methods
copy
([deep])Create a copy of all of the GPU-backed data structures in this GeoArrowBuffers.
to_host
- copy(deep=True)¶
Create a copy of all of the GPU-backed data structures in this GeoArrowBuffers.
- property lines¶
Contains the coordinates column, an offsets column, and a mlines column. The mlines column is optional. The mlines column stores the indices of the offsets that indicate the beginning and end of each MultiLineString segment. The absence of an mlines column indicates there are no MultiLineStrings in the data source, only `LineString`s.
- property multipoints¶
Similar to the Points column with the addition of an offsets column. The offsets column stores the comparable sizes and coordinates of each MultiPoint in the GeoArrowBuffers.
- property points¶
A simple numeric column. x and y coordinates are interleaved such that even coordinates are x axis and odd coordinates are y axis.
- property polygons¶
Contains the coordinates column, a rings column specifying the beginning and end of every polygon, a polygons column specifying the beginning, or exterior, ring of each polygon and the end ring. All rings after the first ring are interior rings. Finally a mpolygons column stores the offsets of the polygons that should be grouped into MultiPolygons.
Spatial Indexing¶
Spatial indexing functions provide blisteringly-fast on-GPU point-in-polygon operations.
- cuspatial.quadtree_point_in_polygon(poly_quad_pairs, quadtree, point_indices, points_x, points_y, poly_offsets, ring_offsets, poly_points_x, poly_points_y)¶
Test whether the specified points are inside any of the specified polygons.
Uses the table of (polygon, quadrant) pairs returned by
cuspatial.join_quadtree_and_bounding_boxes
to ensure only the points in the same quadrant as each polygon are tested for intersection.This pre-filtering can dramatically reduce number of points tested per polygon, enabling faster intersection-testing at the expense of extra memory allocated to store the quadtree and sorted point_indices.
- Parameters
- poly_quad_pairs: cudf.DataFrame
Table of (polygon, quadrant) index pairs returned by
cuspatial.join_quadtree_and_bounding_boxes
.- quadtreecudf.DataFrame
A complete quadtree for a given area-of-interest bounding box.
- point_indicescudf.Series
Sorted point indices returned by
cuspatial.quadtree_on_points
- points_xcudf.Series
x-coordinates of points used to construct the quadtree.
- points_ycudf.Series
y-coordinates of points used to construct the quadtree.
- poly_offsetscudf.Series
Begin index of the first ring in each polygon.
- ring_offsetscudf.Series
Begin index of the first point in each ring.
- poly_points_xcudf.Series
Polygon point x-coodinates.
- poly_points_ycudf.Series
Polygon point y-coodinates.
- Returns
- resultcudf.DataFrame
Indices for each intersecting point and polygon pair.
- polygon_indexcudf.Series
Indices of each polygon with which a point intersected.
- point_indexcudf.Series
Indices of each point that intersects with a polygon.
- cuspatial.quadtree_point_to_nearest_polyline(poly_quad_pairs, quadtree, point_indices, points_x, points_y, poly_offsets, poly_points_x, poly_points_y)¶
Finds the nearest polyline to each point in a quadrant, and computes the distances between each point and polyline.
Uses the table of (polyline, quadrant) pairs returned by
cuspatial.join_quadtree_and_bounding_boxes
to ensure distances are computed only for the points in the same quadrant as each polyline.- Parameters
- poly_quad_pairs: cudf.DataFrame
Table of (polyline, quadrant) index pairs returned by
cuspatial.join_quadtree_and_bounding_boxes
.- quadtreecudf.DataFrame
A complete quadtree for a given area-of-interest bounding box.
- point_indicescudf.Series
Sorted point indices returned by
cuspatial.quadtree_on_points
- points_xcudf.Series
x-coordinates of points used to construct the quadtree.
- points_ycudf.Series
y-coordinates of points used to construct the quadtree.
- poly_offsetscudf.Series
Begin index of the first point in each polyline.
- poly_points_xcudf.Series
Polyline point x-coodinates.
- poly_points_ycudf.Series
Polyline point y-coodinates.
- Returns
- resultcudf.DataFrame
Indices for each point and its nearest polyline, and the distance between the two.
- point_indexcudf.Series
Indices of each point that intersects with a polyline.
- polyline_indexcudf.Series
Indices of each polyline with which a point intersected.
- distancecudf.Series
Distances between each point and its nearest polyline.
- cuspatial.point_in_polygon(test_points_x, test_points_y, poly_offsets, poly_ring_offsets, poly_points_x, poly_points_y)¶
Compute from a set of points and a set of polygons which points fall within which polygons. Note that polygons_(x,y) must be specified as closed polygons: the first and last coordinate of each polygon must be the same.
- Parameters
- test_points_x
x-coordinate of test points
- test_points_y
y-coordinate of test points
- poly_offsets
beginning index of the first ring in each polygon
- poly_ring_offsets
beginning index of the first point in each ring
- poly_points_x
x closed-coordinate of polygon points
- poly_points_y
y closed-coordinate of polygon points
- Returns
- resultcudf.DataFrame
A DataFrame of boolean values indicating whether each point falls within each polygon.
Examples
Test whether 3 points fall within either of two polygons
>>> result = cuspatial.point_in_polygon( [0, -8, 6.0], # test_points_x [0, -8, 6.0], # test_points_y cudf.Series([0, 1], index=['nyc', 'hudson river']), # poly_offsets [0, 3], # ring_offsets [-10, 5, 5, -10, 0, 10, 10, 0], # poly_points_x [-10, -10, 5, 5, 0, 0, 10, 10], # poly_points_y ) # The result of point_in_polygon is a DataFrame of Boolean # values indicating whether each point (rows) falls within # each polygon (columns). >>> print(result) nyc hudson river 0 True True 1 True False 2 False True # Point 0: (0, 0) falls in both polygons # Point 1: (-8, -8) falls in the first polygon # Point 2: (6.0, 6.0) falls in the second polygon
note input Series x and y will not be index aligned, but computed as sequential arrays.
- cuspatial.polygon_bounding_boxes(poly_offsets, ring_offsets, xs, ys)¶
Compute the minimum bounding-boxes for a set of polygons.
- Parameters
- poly_offsets
Begin indices of the first ring in each polygon (i.e. prefix-sum)
- ring_offsets
Begin indices of the first point in each ring (i.e. prefix-sum)
- xs
Polygon point x-coordinates
- ys
Polygon point y-coordinates
- Returns
- resultcudf.DataFrame
minimum bounding boxes for each polygon
- x_mincudf.Series
the minimum x-coordinate of each bounding box
- y_mincudf.Series
the minimum y-coordinate of each bounding box
- x_maxcudf.Series
the maximum x-coordinate of each bounding box
- y_maxcudf.Series
the maximum y-coordinate of each bounding box
- cuspatial.polyline_bounding_boxes(poly_offsets, xs, ys, expansion_radius)¶
Compute the minimum bounding-boxes for a set of polylines.
- Parameters
- poly_offsets
Begin indices of the first ring in each polyline (i.e. prefix-sum)
- xs
Polyline point x-coordinates
- ys
Polyline point y-coordinates
- expansion_radius
radius of each polyline point
- Returns
- resultcudf.DataFrame
minimum bounding boxes for each polyline
- x_mincudf.Series
the minimum x-coordinate of each bounding box
- y_mincudf.Series
the minimum y-coordinate of each bounding box
- x_maxcudf.Series
the maximum x-coordinate of each bounding box
- y_maxcudf.Series
the maximum y-coordinate of each bounding box
- cuspatial.quadtree_on_points(xs, ys, x_min, x_max, y_min, y_max, scale, max_depth, min_size)¶
- Construct a quadtree from a set of points for a given area-of-interest
bounding box.
- Parameters
- xs
Column of x-coordinates for each point
- ys
Column of y-coordinates for each point
- x_min
The lower-left x-coordinate of the area of interest bounding box
- x_max
The upper-right x-coordinate of the area of interest bounding box
- y_min
The lower-left y-coordinate of the area of interest bounding box
- y_max
The upper-right y-coordinate of the area of interest bounding box
- scale
Scale to apply to each point’s distance from
(x_min, y_min)
- max_depth
Maximum quadtree depth
- min_size
Minimum number of points for a non-leaf quadtree node
- Returns
- resulttuple (cudf.Series, cudf.DataFrame)
- keys_to_pointscudf.Series(dtype=np.int32)
A column of sorted keys to original point indices
- quadtreecudf.DataFrame
A complete quadtree for the set of input points
- keycudf.Series(dtype=np.int32)
An int32 column of quadrant keys
- levelcudf.Series(dtype=np.int8)
An int8 column of quadtree levels
- is_quadcudf.Series(dtype=np.bool_)
A boolean column indicating whether the node is a quad or leaf
- lengthcudf.Series(dtype=np.int32)
If this is a non-leaf quadrant (i.e.
is_quad
isTrue
), this column’s value is the number of children in the non-leaf quadrant.Otherwise this column’s value is the number of points contained in the leaf quadrant.
- offsetcudf.Series(dtype=np.int32)
If this is a non-leaf quadrant (i.e.
is_quad
isTrue
), this column’s value is the position of the non-leaf quadrant’s first child.Otherwise this column’s value is the position of the leaf quadrant’s first point.
Notes
Swaps
min_x
andmax_x
ifmin_x > max_x
Swaps
min_y
andmax_y
ifmin_y > max_y
Examples
An example of selecting the
min_size
andscale
based on input:>>> np.random.seed(0) >>> points = cudf.DataFrame({ "x": cudf.Series(np.random.normal(size=120)) * 500, "y": cudf.Series(np.random.normal(size=120)) * 500, }) >>> max_depth = 3 >>> min_size = 50 >>> min_x, min_y, max_x, max_y = (points["x"].min(), points["y"].min(), points["x"].max(), points["y"].max()) >>> scale = max(max_x - min_x, max_y - min_y) // (1 << max_depth) >>> print( "min_size: " + str(min_size) + "\n" "num_points: " + str(len(points)) + "\n" "min_x: " + str(min_x) + "\n" "max_x: " + str(max_x) + "\n" "min_y: " + str(min_y) + "\n" "max_y: " + str(max_y) + "\n" "scale: " + str(scale) + "\n" ) min_size: 50 num_points: 120 min_x: -1577.4949079170394 max_x: 1435.877311993804 min_y: -1412.7015761122134 max_y: 1492.572387431971 scale: 301.0 >>> key_to_point, quadtree = cuspatial.quadtree_on_points( points["x"], points["y"], min_x, max_x, min_y, max_y, scale, max_depth, min_size ) >>> print(quadtree) key level is_quad length offset 0 0 0 False 15 0 1 1 0 False 27 15 2 2 0 False 12 42 3 3 0 True 4 8 4 4 0 False 5 106 5 6 0 False 6 111 6 9 0 False 2 117 7 12 0 False 1 119 8 12 1 False 22 54 9 13 1 False 18 76 10 14 1 False 9 94 11 15 1 False 3 103 >>> print(key_to_point) 0 63 1 20 2 33 3 66 4 19 ... 115 113 116 3 117 78 118 98 119 24 Length: 120, dtype: int32
- cuspatial.join_quadtree_and_bounding_boxes(quadtree, poly_bounding_boxes, x_min, x_max, y_min, y_max, scale, max_depth)¶
Search a quadtree for polygon or polyline bounding box intersections.
- Parameters
- quadtreecudf.DataFrame
A complete quadtree for a given area-of-interest bounding box.
- poly_bounding_boxescudf.DataFrame
Minimum bounding boxes for a set of polygons or polylines
- x_min
The lower-left x-coordinate of the area of interest bounding box
- x_max
The upper-right x-coordinate of the area of interest bounding box
- min_y
The lower-left y-coordinate of the area of interest bounding box
- max_y
The upper-right y-coordinate of the area of interest bounding box
- scale
Scale to apply to each point’s distance from
(x_min, y_min)
- max_depth
Maximum quadtree depth at which to stop testing for intersections
- Returns
- resultcudf.DataFrame
Indices for each intersecting bounding box and leaf quadrant.
- poly_offsetcudf.Series
Indices for each poly bbox that intersects with the quadtree.
- quad_offsetcudf.Series
Indices for each leaf quadrant intersecting with a poly bbox.
Notes
Swaps
min_x
andmax_x
ifmin_x > max_x
Swaps
min_y
andmax_y
ifmin_y > max_y
- cuspatial.points_in_spatial_window(min_x, max_x, min_y, max_y, xs, ys)¶
Return only the subset of coordinates that fall within a rectangular window.
A point (x, y) is inside the query window if and only if
min_x < x < max_x AND min_y < y < max_y
The window is specified by minimum and maximum x and y coordinates.
- Parameters
- min_x
lower x-coordinate of the query window
- max_x
upper x-coordinate of the query window
- min_y
lower y-coordinate of the query window
- max_y
upper y-coordinate of the query window
- xs
column of x-coordinates that may fall within the window
- ys
column of y-coordinates that may fall within the window
- Returns
- resultcudf.DataFrame
subset of (x, y) pairs above that fall within the window
Notes
Swaps
min_x
andmax_x
ifmin_x > max_x
Swaps
min_y
andmax_y
ifmin_y > max_y
GIS¶
Two GIS functions make it easier to compute distances with geographic coordinates.
- cuspatial.haversine_distance(p1_lon, p1_lat, p2_lon, p2_lat)¶
Compute the haversine distances between an arbitrary list of lon/lat pairs
- Parameters
- p1_lon
longitude of first set of coords
- p1_lat
latitude of first set of coords
- p2_lon
longitude of second set of coords
- p2_lat
latitude of second set of coords
- Returns
- resultcudf.Series
The distance between all pairs of lon/lat coordinates
- cuspatial.lonlat_to_cartesian(origin_lon, origin_lat, input_lon, input_lat)¶
Convert lon/lat to
x,y
coordinates with respect to an origin lon/lat point. Results are scaled relative to the size of the Earth in kilometers.- Parameters
- origin_lon
number
longitude offset (this is subtracted from each input before converting to x,y)
- origin_lat
number
latitude offset (this is subtracted from each input before converting to x,y)
- input_lon
Series
orlist
longitude coordinates to convert to x
- input_lat
Series
orlist
latitude coordinates to convert to y
- origin_lon
- Returns
- resultcudf.DataFrame
- xcudf.Series
x-coordinate of the input relative to the size of the Earth in kilometers.
- ycudf.Series
y-coordinate of the input relative to the size of the Earth in kilometers.
Trajectory¶
Trajectory functions make it easy to identify and group trajectories from point data.
- cuspatial.derive_trajectories(object_ids, xs, ys, timestamps)¶
Derive trajectories from object ids, points, and timestamps.
- Parameters
- object_ids
column of object (e.g., vehicle) ids
- xs
column of x-coordinates (in kilometers)
- ys
column of y-coordinates (in kilometers)
- timestamps
column of timestamps in any resolution
- Returns
- resulttuple (objects, traj_offsets)
- objectscudf.DataFrame
object_ids, xs, ys, and timestamps sorted by
(object_id, timestamp)
, used bytrajectory_bounding_boxes
andtrajectory_distances_and_speeds
- traj_offsetscudf.Series
offsets of discovered trajectories
Examples
Compute sorted objects and discovered trajectories
>>> objects, traj_offsets = cuspatial.derive_trajectories( [0, 1, 2, 3], # object_id [0, 0, 1, 1], # x [0, 0, 1, 1], # y [0, 10, 0, 10] # timestamp ) >>> print(traj_offsets) 0 0 1 2 >>> print(objects) object_id x y timestamp 0 0 1 0 0 1 0 0 0 10 2 1 3 1 0 3 1 2 1 10
- cuspatial.trajectory_distances_and_speeds(num_trajectories, object_ids, xs, ys, timestamps)¶
Compute the distance traveled and speed of sets of trajectories
- Parameters
- num_trajectories
number of trajectories (unique object ids)
- object_ids
column of object (e.g., vehicle) ids
- xs
column of x-coordinates (in kilometers)
- ys
column of y-coordinates (in kilometers)
- timestamps
column of timestamps in any resolution
- Returns
- resultcudf.DataFrame
- meterscudf.Series
trajectory distance (in kilometers)
- speedcudf.Series
trajectory speed (in meters/second)
Examples
Compute the distances and speeds of derived trajectories
>>> objects, traj_offsets = cuspatial.derive_trajectories(...) >>> dists_and_speeds = cuspatial.trajectory_distances_and_speeds( len(traj_offsets) objects['object_id'], objects['x'], objects['y'], objects['timestamp'] ) >>> print(dists_and_speeds) distance speed trajectory_id 0 1000.0 100000.000000 1 1000.0 111111.109375
- cuspatial.directed_hausdorff_distance(xs, ys, space_offsets)¶
Compute the directed Hausdorff distances between all pairs of spaces.
- Parameters
- xs
column of x-coordinates
- ys
column of y-coordinates
- space_offsets
beginning index of each space, plus the last space’s end offset.
- Returns
- resultcudf.DataFrame
The pairwise directed distance matrix with one row and one column per input space; the value at row i, column j represents the hausdorff distance from space i to space j.
Examples
The directed Hausdorff distance from one space to another is the greatest of all the distances between any point in the first space to the closest point in the second.
Consider a pair of lines on a grid:
: x -----xyy--- : :
x_{0} = (0, 0), x_{1} = (0, 1)
y_{0} = (1, 0), y_{1} = (2, 0)
x_{0} is the closest point in
x
toy
. The distance from x_{0} to the farthest point iny
is 2.y_{0} is the closest point in
y
tox
. The distance from y_{0} to the farthest point inx
is 1.414.Compute the directed hausdorff distances between a set of spaces
>>> result = cuspatial.directed_hausdorff_distance( [0, 1, 0, 0], # xs [0, 0, 1, 2], # ys [0, 2, 4], # space_offsets ) >>> print(result) 0 1 0 0.0 1.414214 1 2.0 0.000000
- cuspatial.trajectory_bounding_boxes(num_trajectories, object_ids, xs, ys)¶
Compute the bounding boxes of sets of trajectories.
- Parameters
- num_trajectories
number of trajectories (unique object ids)
- object_ids
column of object (e.g., vehicle) ids
- xs
column of x-coordinates (in kilometers)
- ys
column of y-coordinates (in kilometers)
- Returns
- resultcudf.DataFrame
minimum bounding boxes (in kilometers) for each trajectory
- x_mincudf.Series
the minimum x-coordinate of each bounding box
- y_mincudf.Series
the minimum y-coordinate of each bounding box
- x_maxcudf.Series
the maximum x-coordinate of each bounding box
- y_maxcudf.Series
the maximum y-coordinate of each bounding box
Examples
Compute the minimum bounding boxes of derived trajectories
>>> objects, traj_offsets = trajectory.derive_trajectories( [0, 0, 1, 1], # object_id [0, 1, 2, 3], # x [0, 0, 1, 1], # y [0, 10, 0, 10] # timestamp ) >>> traj_bounding_boxes = cuspatial.trajectory_bounding_boxes( len(traj_offsets), objects['object_id'], objects['x'], objects['y'] ) >>> print(traj_bounding_boxes) x_min y_min x_max y_max 0 0.0 0.0 2.0 2.0 1 1.0 1.0 3.0 3.0
- class cuspatial.CubicSpline(t, y, ids=None, size=None, prefixes=None)¶
Fits each column of the input Series y to a hermetic cubic spline.
cuspatial.CubicSpline
supports two usage patterns: The first is identical to scipy.interpolate.CubicSpline:curve = cuspatial.CubicSpline(t, y) new_points = curve(np.linspace(t.min, t.max, 50))
This allows API parity with scipy. This isn’t recommended, as scipy host based interpolation performance is likely to exceed GPU performance for a single curve.
However, cuSpatial massively outperforms scipy when many splines are fit simultaneously. Data must be arranged in a SoA format, and the exclusive prefix_sum of the separate curves must also be passed to the function.:
NUM_SPLINES = 100000 SPLINE_LENGTH = 101 t = cudf.Series( np.hstack((np.arange(SPLINE_LENGTH),) * NUM_SPLINES) ).astype('float32') y = cudf.Series( np.random.random(SPLINE_LENGTH*NUM_SPLINES) ).astype('float32') prefix_sum = cudf.Series( cp.arange(NUM_SPLINES + 1)*SPLINE_LENGTH ).astype('int32') curve = cuspatial.CubicSpline(t, y, prefixes=prefix_sum) new_samples = cudf.Series( np.hstack((np.linspace( 0, (SPLINE_LENGTH - 1), (SPLINE_LENGTH - 1) * 2 + 1 ),) * NUM_SPLINES) ).astype('float32') curve_ids = cudf.Series(np.repeat( np.arange(0, NUM_SPLINES), SPLINE_LENGTH * 2 - 1 ), dtype="int32") new_points = curve(new_samples, curve_ids)
Methods
__call__
(coordinates[, groups])Interpolates new input values coordinates using the .c DataFrame or map of DataFrames.
- CubicSpline.__init__(t, y, ids=None, size=None, prefixes=None)¶
Computes various error preconditions on the input data, then uses CUDA to compute cubic splines for each set of input coordinates on the GPU in parallel.
- Parameters
- tcudf.Series
time sample values. Must be monotonically increasing.
- ycudf.Series
columns to have curves fit to according to x
- ids (Optional)cudf.Series
ids of each spline
- size (Optional)cudf.Series
fixed size of each spline
- prefixes (Optional)cudf.Series
alternative to size, allows splines of varying length. Not yet fully supported.
- Returns
- CubicSplinecallable o
o.c
contains the coefficients that can be used to compute new points along the spline fitting the originalt
data.o(n)
interpolates the spline coordinates along new input valuesn
.
- CubicSpline.__call__(coordinates, groups=None)¶
Interpolates new input values coordinates using the .c DataFrame or map of DataFrames.
IO¶
cuSpatial offers native GPU-accelerated shapefile reading. In addition, any host-side GeoPandas DataFrame can be copied into GPU memory for use with cuSpatial algorithms.
- cuspatial.read_polygon_shapefile(filename)¶
Reads polygon geometry from an ESRI shapefile into GPU memory.
- Parameters
- filenamestr, pathlike
ESRI Shapefile file path (usually ends in
.shp
)
- Returns
- resulttuple (cudf.Series, cudf.Series, cudf.DataFrame)
- poly_offsetscudf.Series(dtype=np.int32)
Offsets of the first ring in each polygon
- ring_offsetscudf.Series(dtype=np.int32)
Offsets of the first point in each ring
- pointscudf.DataFrame
- DataFrame of all points in the shapefile
- xcudf.Series(dtype=np.float64)
x-components of each polygon’s points
- ycudf.Series(dtype=np.float64)
y-components of each polygon’s points
- cuspatial.from_geopandas(gpdf)¶
Converts a geopandas mixed geometry dataframe into a cuspatial geometry dataframe.
Possible inputs:
geopandas.geoseries.GeoSeries geopandas.geodataframe.GeoDataFrame