Dynamic Batching#

Dynamic Batching allows grouping small search requests into batches to increase the device occupancy and throughput while keeping the latency within limits.

#include <cuvs/neighbors/dynamic_batching.hpp>

namespace cuvs::neighbors::dynamic_batching

Index build parameters#

struct index_params : public cuvs::neighbors::index_params#
#include <dynamic_batching.hpp>

Public Members

int64_t k#

The number of neighbors to search is fixed at construction time.

int64_t max_batch_size = 100#

Maximum size of the batch to submit to the upstream index.

size_t n_queues = 3#

The number of independent request queues.

Each queue is associated with a unique CUDA stream and IO device buffers. If the number of concurrent requests is high, using multiple queues allows to fill-in data and prepare the batch while the other queue is busy. Moreover, the queues are submitted concurrently; this allows to better utilize the GPU by hiding the kernel launch latencies, which helps to improve the throughput.

bool conservative_dispatch = false#

By default (conservative_dispatch = false) the first CPU thread to commit a query to a batch dispatches the upstream search function as soon as possible (before the batch is full). In that case, it does not know the final batch size at the time of calling the upstream search and thus runs the upstream search with the maximum batch size every time, even if only one valid query is present in the batch. This reduces the latency at the cost of wasted GPU resources.

The alternative behavaior (conservative_dispatch = true) is more conservative: the dispatcher thread starts the kernel that gathers input queries, but waits till the batch is full or the waiting time is exceeded. Only then it acquires the actual batch size and launches the upstream search. As a result, less GPU resources are wasted at the cost of exposing upstream search latency.

Rule of Thumb: for a large max_batch_size set conservative_dispatch = true, otherwise keep it disabled.

Index search parameters#

struct search_params : public cuvs::neighbors::search_params#
#include <dynamic_batching.hpp>

Public Members

double dispatch_timeout_ms = 1.0#

How long a request can stay in the queue (milliseconds). Note, this only affects the dispatch time and does not reflect full request latency; the latter depends on the upstream search parameters and the batch size.

Index#

template<typename T, typename IdxT>
struct index : public cuvs::neighbors::index#
#include <dynamic_batching.hpp>

Lightweight dynamic batching index wrapper.

One lightweight dynamic batching index manages a single index and a single search parameter set. This structure should be shared among multiple users via copy semantics: access to the underlying implementation is managed via a shared pointer, and concurrent search among the participants is thread-safe.

Usage example

using namespace cuvs::neighbors;
// When creating a dynamic batching index, k parameter has to be passed explicitly.
// The first empty braces default-initialize the parent `neighbors::index_params` (unused).
dynamic_batching::index_params dynb_index_params{{}, k};
// Construct the index by wrapping the upstream index and search parameters.
dynamic_batching::index<float, uint32_t> index{
    res, dynb_index_params, upstream_index, upstream_search_params
};
// Use default search parameters
dynamic_batching::search_params search_params;
// Search K nearest neighbours
auto neighbors = raft::make_device_matrix<uint32_t>(res, n_queries, k);
auto distances = raft::make_device_matrix<float>(res, n_queries, k);
dynamic_batching::search(
    res, search_params, index, queries, neighbors.view(), distances.view()
);

Priority queues

The dynamic batching index has a limited support for prioritizing individual requests. There’s only one pool of queues in the batcher and no functionality to prioritize one bach over the other. The search_params::dispatch_timeout_ms parameters passed in each request are aggregated internally and the batch is dispatched no later than any of the timeouts is exceeded. In this logic, a high-priority request can never be processed earlier than any lower-priority requests submitted earlier.

However, dynamic batching indexes are lightweight and do not contain any global or static state. This means it’s easy to combine multiple batchers. As an example, you can construct one batching index per priority class:

using namespace cuvs::neighbors;
// Large batch size (128), couple queues (2),
//   enabled conservative dispatch - all for better throughput
dynamic_batching::index_params low_priority_params{{}, k, 128, 2, true};
// Small batch size (16), more queues (4),
//   disabled conservative dispatch - to minimize latency with reasonable throughput
dynamic_batching::index_params high_priority_params{{}, k, 16, 4, false};
// Construct the indexes by wrapping the upstream index and search parameters.
dynamic_batching::index<float, uint32_t> low_priority_index{
    res, low_priority_params, upstream_index, upstream_search_params
};
dynamic_batching::index<float, uint32_t> high_priority_index{
    res, high_priority_params, upstream_index, upstream_search_params
};
// Define a combined search function with priority selection
double high_priority_threshold_ms = 0.1;
auto search_function =
   [low_priority_index, high_priority_index, high_priority_threshold_ms](
     raft::resources const &res,
     dynamic_batching::search_params search_params,
     raft::device_matrix_view<const float, int64_t> queries,
     raft::device_matrix_view<uint32_t, int64_t> neighbors,
     raft::device_matrix_view<float, int64_t> distances) {
   dynamic_batching::search(
       res,
       search_params,
       search_params.dispatch_timeout_ms < high_priority_threshold_ms
         ? high_priority_index : low_priority_index,
       queries,
       neighbors,
       distances
   );
};

Template Parameters:
  • T – data type

  • IdxT – index type

Public Functions

template<typename Upstream>
index(const raft::resources &res, const cuvs::neighbors::dynamic_batching::index_params &params, const Upstream &upstream_index, const typename Upstream::search_params_type &upstream_params, const cuvs::neighbors::filtering::base_filter *sample_filter = nullptr)#

Construct a dynamic batching index by wrapping the upstream index.

Template Parameters:

Upstream – the upstream index type

Parameters:
  • res[in] raft resources

  • params[in] dynamic batching parameters

  • upstream_index[in] the original index to perform the search (the reference must be alive for the lifetime of the dynamic batching index)

  • upstream_params[in] the original index search parameters for all queries in a batch (the parameters are captured by value for the lifetime of the dynamic batching index)

  • sample_filter[in] filtering function, if any, must be the same for all requests in a batch (the pointer must be alive for the lifetime of the dynamic batching index)