Public Member Functions | Static Public Member Functions | List of all members
kvikio::detail::StreamCachePerThreadAndContext Class Reference

Singleton cache that provides one CUDA stream per (context, thread) pair. More...

#include <stream.hpp>

Public Member Functions

 StreamCachePerThreadAndContext (StreamCachePerThreadAndContext const &)=delete
 
StreamCachePerThreadAndContextoperator= (StreamCachePerThreadAndContext const &)=delete
 
 StreamCachePerThreadAndContext (StreamCachePerThreadAndContext &&o)=delete
 
StreamCachePerThreadAndContextoperator= (StreamCachePerThreadAndContext &&o)=delete
 

Static Public Member Functions

static KVIKIO_EXPORT CUstream get ()
 Get or create a CUDA stream for the current context and thread. More...
 

Detailed Description

Singleton cache that provides one CUDA stream per (context, thread) pair.

This class manages CUDA streams used for host-device memory transfers. Each unique combination of CUDA context and calling thread is assigned a dedicated stream, which is created lazily on first access and reused for subsequent calls.

The cache is thread-safe and handles concurrent access from multiple threads.

Note
CUDA streams are intentionally leaked on program termination rather than destroyed in the destructor. This avoids undefined behavior that can occur when destroying CUDA resources during static destruction, and prevents crashes (segmentation faults) if clients call cuDevicePrimaryCtxReset() or cudaDeviceReset() before program termination. See: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#initialization

Definition at line 29 of file detail/stream.hpp.

Member Function Documentation

◆ get()

static KVIKIO_EXPORT CUstream kvikio::detail::StreamCachePerThreadAndContext::get ( )
static

Get or create a CUDA stream for the current context and thread.

If a stream already exists for the current (context, thread) pair, it is returned. Otherwise, a new stream is created, cached, and returned.

Returns
The CUDA stream associated with the current (context, thread) pair, or nullptr if no CUDA context is current.

The documentation for this class was generated from the following file: