Io Datasinks#

group io_datasinks
class data_sink#
#include <data_sink.hpp>

Interface class for storing the output data from the writers.

Public Functions

inline virtual ~data_sink()#

Base class destructor.

virtual void host_write(void const *data, size_t size) = 0#

Append the buffer content to the sink.

Parameters:
  • data[in] Pointer to the buffer to be written into the sink object

  • size[in] Number of bytes to write

inline virtual bool supports_device_write() const#

Whether or not this sink supports writing from gpu memory addresses.

Internal to some of the file format writers, we have code that does things like

tmp_buffer = alloc_temp_buffer(); cudaMemcpy(tmp_buffer, device_buffer, size); sink->write(tmp_buffer, size);

In the case where the sink type is itself a memory buffered write, this ends up being effectively a second memcpy. So a useful optimization for a “smart” custom data_sink is to do it’s own internal management of the movement of data between cpu and gpu; turning the internals of the writer into simply

sink->device_write(device_buffer, size)

If this function returns true, the data_sink will receive calls to device_write() instead of write() when possible. However, it is still possible to receive write() calls as well.

Returns:

If this writer supports device_write() calls

inline virtual bool is_device_write_preferred(size_t size) const#

Estimates whether a direct device write would be more optimal for the given size.

Parameters:

size – Number of bytes to write

Returns:

whether the device write is expected to be more performant for the given size

inline virtual void device_write(void const *gpu_data, size_t size, rmm::cuda_stream_view stream)#

Append the buffer content to the sink from a gpu address.

For optimal performance, should only be called when is_device_write_preferred returns true. Data sink implementations that don’t support direct device writes don’t need to override this function.

Throws:

cudf::logic_error – the object does not support direct device writes, i.e. supports_device_write returns false.

Parameters:
  • gpu_data – Pointer to the buffer to be written into the sink object

  • size – Number of bytes to write

  • stream – CUDA stream to use

inline virtual std::future<void> device_write_async(void const *gpu_data, size_t size, rmm::cuda_stream_view stream)#

Asynchronously append the buffer content to the sink from a gpu address.

For optimal performance, should only be called when is_device_write_preferred returns true. Data sink implementations that don’t support direct device writes don’t need to override this function.

gpu_data must not be freed until this call is synchronized.

auto result = device_write_async(gpu_data, size, stream);
result.wait(); // OR result.get()

Throws:
Parameters:
  • gpu_data – Pointer to the buffer to be written into the sink object

  • size – Number of bytes to write

  • stream – CUDA stream to use

Returns:

a future that can be used to synchronize the call

virtual void flush() = 0#

Flush the data written into the sink.

virtual size_t bytes_written() = 0#

Returns the total number of bytes written into this sink.

Returns:

Total number of bytes written into this sink

Public Static Functions

static std::unique_ptr<data_sink> create(std::string const &filepath)#

Create a sink from a file path.

Parameters:

filepath[in] Path to the file to use

Returns:

Constructed data_sink object

static std::unique_ptr<data_sink> create(std::vector<char> *buffer)#

Create a sink from a std::vector.

Parameters:

buffer[inout] Pointer to the output vector

Returns:

Constructed data_sink object

static std::unique_ptr<data_sink> create()#

Create a void sink (one that does no actual io)

A useful code path for benchmarking, to eliminate physical hardware randomness from profiling.

Returns:

Constructed data_sink object

static std::unique_ptr<data_sink> create(cudf::io::data_sink *const user_sink)#

Create a wrapped custom user data sink.

The data sink returned here is not the one passed by the user. It is an internal class that wraps the user pointer. The principle is to allow the user to declare a custom sink instance and use it across multiple write() calls.

Parameters:

user_sink[in] User-provided data sink (typically custom class)

Returns:

Constructed data_sink object

template<typename T>
static inline std::vector<std::unique_ptr<data_sink>> create(std::vector<T> const &args)#

Creates a vector of data sinks, one per element in the input vector.

Parameters:

args[in] vector of parameters

Returns:

Constructed vector of data sinks