Io Datasinks#
- group io_datasinks
-
class data_sink#
- #include <data_sink.hpp>
Interface class for storing the output data from the writers.
Public Functions
-
inline virtual ~data_sink()#
Base class destructor.
-
virtual void host_write(void const *data, size_t size) = 0#
Append the buffer content to the sink.
- Parameters:
data – [in] Pointer to the buffer to be written into the sink object
size – [in] Number of bytes to write
-
inline virtual bool supports_device_write() const#
Whether or not this sink supports writing from gpu memory addresses.
Internal to some of the file format writers, we have code that does things like
tmp_buffer = alloc_temp_buffer(); cudaMemcpy(tmp_buffer, device_buffer, size); sink->write(tmp_buffer, size);
In the case where the sink type is itself a memory buffered write, this ends up being effectively a second memcpy. So a useful optimization for a “smart” custom data_sink is to do it’s own internal management of the movement of data between cpu and gpu; turning the internals of the writer into simply
sink->device_write(device_buffer, size)
If this function returns true, the data_sink will receive calls to device_write() instead of write() when possible. However, it is still possible to receive write() calls as well.
- Returns:
If this writer supports device_write() calls
-
inline virtual bool is_device_write_preferred(size_t size) const#
Estimates whether a direct device write would be more optimal for the given size.
- Parameters:
size – Number of bytes to write
- Returns:
whether the device write is expected to be more performant for the given size
-
inline virtual void device_write(void const *gpu_data, size_t size, rmm::cuda_stream_view stream)#
Append the buffer content to the sink from a gpu address.
For optimal performance, should only be called when
is_device_write_preferred
returnstrue
. Data sink implementations that don’t support direct device writes don’t need to override this function.- Throws:
cudf::logic_error – the object does not support direct device writes, i.e.
supports_device_write
returnsfalse
.- Parameters:
gpu_data – Pointer to the buffer to be written into the sink object
size – Number of bytes to write
stream – CUDA stream to use
-
inline virtual std::future<void> device_write_async(void const *gpu_data, size_t size, rmm::cuda_stream_view stream)#
Asynchronously append the buffer content to the sink from a gpu address.
For optimal performance, should only be called when
is_device_write_preferred
returnstrue
. Data sink implementations that don’t support direct device writes don’t need to override this function.gpu_data
must not be freed until this call is synchronized.auto result = device_write_async(gpu_data, size, stream); result.wait(); // OR result.get()
- Throws:
cudf::logic_error – the object does not support direct device writes, i.e.
supports_device_write
returnsfalse
.cudf::logic_error –
- Parameters:
gpu_data – Pointer to the buffer to be written into the sink object
size – Number of bytes to write
stream – CUDA stream to use
- Returns:
a future that can be used to synchronize the call
-
virtual void flush() = 0#
Flush the data written into the sink.
-
virtual size_t bytes_written() = 0#
Returns the total number of bytes written into this sink.
- Returns:
Total number of bytes written into this sink
Public Static Functions
-
static std::unique_ptr<data_sink> create(std::string const &filepath)#
Create a sink from a file path.
- Parameters:
filepath – [in] Path to the file to use
- Returns:
Constructed data_sink object
-
static std::unique_ptr<data_sink> create(std::vector<char> *buffer)#
Create a sink from a std::vector.
- Parameters:
buffer – [inout] Pointer to the output vector
- Returns:
Constructed data_sink object
-
static std::unique_ptr<data_sink> create()#
Create a void sink (one that does no actual io)
A useful code path for benchmarking, to eliminate physical hardware randomness from profiling.
- Returns:
Constructed data_sink object
-
static std::unique_ptr<data_sink> create(cudf::io::data_sink *const user_sink)#
Create a wrapped custom user data sink.
The data sink returned here is not the one passed by the user. It is an internal class that wraps the user pointer. The principle is to allow the user to declare a custom sink instance and use it across multiple write() calls.
- Parameters:
user_sink – [in] User-provided data sink (typically custom class)
- Returns:
Constructed data_sink object
-
inline virtual ~data_sink()#
-
class data_sink#