The original cuSpatial C++ API (libcuspatial) was designed to depend on RAPIDS libcudf and use its core data types, especially cudf::column
. For users who do not also use libcudf or other RAPIDS APIS, depending on libcudf could be a big barrier to adoption of libcuspatial. libcudf is a very large library and building it takes a lot of time.
Therefore, the core of cuSpatial is now implemented in a standalone C++ API that does not depend on libcudf. This is a header-only template API with an iterator- and range-based interface. This has a number of advantages.
The main disadvantages of this type of API are
The good news is that maintaining the existing libcudf-based C++ API as a layer above the header- only libcuspatial API avoids problem 1 and problem 2 for users of the column-based API.
Following is an example iterator-based API for cuspatial::haversine_distance
. (See below for discussion of API documentation.)
There are a few key points to notice.
std::transform
.cuspatial::vec_2d
type (include/cuspatial/vec_2d.hpp). This is enforced using a static_assert
in the function body (discussed later).Location
type is a template that is by default equal to the value_type
of the input iterators.T
) that is by default equal to the value_type
of Location
.a_lonlat_first
and a_lonlat_last
). This mirrors STL APIs.std::transform
, even though as with transform
, many uses of haversine_distance
will not need this returned iterator.Following is the (Doxygen) documentation for the above cuspatial::haversine_distance
.
/** * @brief Compute haversine distances between points in set A to the corresponding points in set B. * * Computes N haversine distances, where N is `std::distance(a_lonlat_first, a_lonlat_last)`. * The distance for each `a_lonlat[i]` and `b_lonlat[i]` point pair is assigned to * `distance_first[i]`. `distance_first` must be an iterator to output storage allocated for N * distances. * * Computed distances will have the same units as `radius`. * * https://en.wikipedia.org/wiki/Haversine_formula * * @param[in] a_lonlat_first: beginning of range of (longitude, latitude) locations in set A * @param[in] a_lonlat_last: end of range of (longitude, latitude) locations in set A * @param[in] b_lonlat_first: beginning of range of (longitude, latitude) locations in set B * @param[out] distance_first: beginning of output range of haversine distances * @param[in] radius: radius of the sphere on which the points reside. default: 6371.0 * (approximate radius of Earth in km) * @param[in] stream: The CUDA stream on which to perform computations and allocate memory. * * @tparam LonLatItA Iterator to input location set A. Must meet the requirements of * [LegacyRandomAccessIterator][LinkLRAI] and be device-accessible. * @tparam LonLatItB Iterator to input location set B. Must meet the requirements of * [LegacyRandomAccessIterator][LinkLRAI] and be device-accessible. * @tparam OutputIt Output iterator. Must meet the requirements of * [LegacyRandomAccessIterator][LinkLRAI] and be device-accessible. * @tparam Location The `value_type` of `LonLatItA` and `LonLatItB`. Must be `cuspatial::vec_2d<T>`. * @tparam T The underlying coordinate type. Must be a floating-point type. * * @pre `a_lonlat_first` may equal `distance_first`, but the range `[a_lonlat_first, a_lonlat_last)` * shall not overlap the range `[distance_first, distance_first + (a_lonlat_last - a_lonlat_last)) * otherwise. * @pre `b_lonlat_first` may equal `distance_first`, but the range `[b_lonlat_first, b_lonlat_last)` * shall not overlap the range `[distance_first, distance_first + (b_lonlat_last - b_lonlat_last)) * otherwise. * @pre All iterators must have the same `Location` type, with the same underlying floating-point * coordinate type (e.g. `cuspatial::vec_2d<float>`). * * @return Output iterator to the element past the last distance computed. * * [LinkLRAI]: https://en.cppreference.com/w/cpp/named_req/RandomAccessIterator * "LegacyRandomAccessIterator" */
Key points:
@pre
.This is the existing API, unchanged by refactoring. Here is the existing cuspatial::haversine_distance
:
key points:
cudf::column_view
. This is a type-erased container so determining the type of data must be done at run time.unique_ptr<cudf::column>
.detail
version of the API that takes a stream. This follows libcudf, and may change in the future.libcuspatial APIs should be defined in a header file in the cpp/include/cuspatial/
directory. The API header should be named after the API. In the example, haversine.hpp
defines the cuspatial::haversine_distance
API.
The implementation must also be in a header, but should be in the cuspatial/detail
directory. The implementation should be included from the API definition file, at the end of the file. Example:
Public APIs are in the cuspatial
namespace. Note that both the header-only API and the libcudf- based API can live in the same namespace, because they are non-ambiguous (very different parameters).
Implementation of the header-only API should be in a cuspatial::detail
namespace.
The main implementation should be in detail headers.
Because it is a statically typed API, the header-only implementation can be much simpler than the libcudf-based API, which requires run-time type dispatching. In the case of haversine_distance
, it is a simple matter of a few static asserts and dynamic expectation checks, followed by a call to thrust::transform
with a custom transform functor.
Note that we static_assert
that the types of the iterator inputs match documented expectations. We also do a runtime check that the radius is positive. Finally we just call thrust::transform
, passing it an instance of haversine_distance_functor
, which is a function of two vec_2d<T>
inputs that implements the Haversine distance formula.
The substance of the refactoring is making the libcudf-based API a wrapper around the header-only API. This mostly involves replacing business logic implementation in the type-dispatched functor with a call to the header-only API. We also need to convert disjoint latitude and longitude inputs into vec_2d<T>
structs. This is easily done using the cuspatial::make_vec_2d_iterator
utility provided in type_utils.hpp
.
So, to refactor the libcudf-based API, we remove the following code.
And replace it with the following code.
Existing libcudf-based API tests can mostly be left alone. New tests should be added to exercise the header-only API separately in case the libcudf-based API is removed.
Note that tests, like the header-only API, should not depend on libcudf or libcudf_test. The cuDF-based API made the mistake of depending on libcudf_test, which results in breakages of cuSpatial sometimes when libcudf_test changes.