Public Types | Public Member Functions | List of all members
rapidsmpf::streaming::TableChunk Class Reference

A unit of table data in a streaming pipeline. More...

#include <table_chunk.hpp>

Public Types

enum class  ExclusiveView : bool { NO , YES }
 Indicates whether the TableChunk holds an exclusive or shared view of the underlying table data. More...
 

Public Member Functions

 TableChunk (std::unique_ptr< cudf::table > table, rmm::cuda_stream_view stream)
 Construct a TableChunk from a device table. More...
 
 TableChunk (cudf::table_view table_view, std::size_t device_alloc_size, rmm::cuda_stream_view stream, OwningWrapper &&owner, ExclusiveView exclusive_view)
 Construct a TableChunk from a device table view. More...
 
 TableChunk (std::unique_ptr< cudf::packed_columns > packed_columns, rmm::cuda_stream_view stream)
 Construct a TableChunk from packed columns. More...
 
 TableChunk (std::unique_ptr< PackedData > packed_data)
 Construct a TableChunk from a packed data blob. More...
 
 TableChunk (TableChunk &&)=default
 TableChunk is moveable.
 
TableChunkoperator= (TableChunk &&)=default
 Move assignment. More...
 
 TableChunk (TableChunk const &)=delete
 
TableChunkoperator= (TableChunk const &)=delete
 
rmm::cuda_stream_view stream () const noexcept
 Returns the CUDA stream on which this table chunk was created. More...
 
std::size_t data_alloc_size (MemoryType mem_type) const
 Number of bytes allocated for the data in the specified memory type. More...
 
bool is_available () const noexcept
 Indicates whether the underlying cudf table data is fully available in device memory. More...
 
std::size_t make_available_cost () const noexcept
 Returns the estimated cost (in bytes) of making the table available. More...
 
TableChunk make_available (MemoryReservation &reservation)
 Moves this table chunk into a new one with its cudf table made available. More...
 
cudf::table_view table_view () const
 Returns a view of the underlying table. More...
 
bool is_spillable () const
 Indicates whether this table chunk can be spilled to device memory. More...
 
TableChunk copy (MemoryReservation &reservation) const
 Create a deep copy of the table chunk. More...
 

Detailed Description

A unit of table data in a streaming pipeline.

Represents either an unpacked cudf::table, a cudf::packed_columns, or a PackedData.

TableChunks may be initially unavailable (e.g., if the data is packed or spilled), and can be made available (i.e., materialized to device memory) on demand.

Definition at line 36 of file table_chunk.hpp.

Member Enumeration Documentation

◆ ExclusiveView

Indicates whether the TableChunk holds an exclusive or shared view of the underlying table data.

This boolean enum is used to explicitly express ownership semantics when constructing a TableChunk from a cudf::table_view.

  • ExclusiveView::YES: The TableChunk has exclusive ownership of the table's device memory and are considered spillable.
  • ExclusiveView::NO: The TableChunk is a non-owning view of data managed elsewhere. The memory may be shared or externally owned, and the chunk is therefore not spillable.

Definition at line 52 of file table_chunk.hpp.

Constructor & Destructor Documentation

◆ TableChunk() [1/4]

rapidsmpf::streaming::TableChunk::TableChunk ( std::unique_ptr< cudf::table >  table,
rmm::cuda_stream_view  stream 
)

Construct a TableChunk from a device table.

Parameters
tableDevice-resident table.
streamThe CUDA stream on which the table was created.

◆ TableChunk() [2/4]

rapidsmpf::streaming::TableChunk::TableChunk ( cudf::table_view  table_view,
std::size_t  device_alloc_size,
rmm::cuda_stream_view  stream,
OwningWrapper &&  owner,
ExclusiveView  exclusive_view 
)

Construct a TableChunk from a device table view.

The TableChunk does not take ownership of the underlying data; instead, the provided owner object is kept alive for the lifetime of the TableChunk. The caller is responsible for ensuring that the underlying device memory referenced by table_view remains valid during this period.

This constructor is typically used when creating a TableChunk from Python, where owner is used to keep the corresponding Python object alive until the TableChunk is destroyed.

Parameters
table_viewDevice-resident table view.
device_alloc_sizeNumber of bytes allocated in device memory.
streamCUDA stream on which the table was created.
ownerObject owning the memory backing table_view. This object will be destroyed last when the TableChunk is destroyed or spilled.
exclusive_viewSpecifies whether this TableChunk has exclusive ownership semantics over the underlying table data:
  • When ExclusiveView::YES, the following guarantees must hold:
    • The table_view is the sole representation of the table.
    • The owner exclusively owns the table memory. These guarantees allow the TableChunk to be spillable and ensure that destroying owner will correctly free the associated device memory.
  • When ExclusiveView::NO, the chunk is considered a non-owning view and is therefore not spillable.

◆ TableChunk() [3/4]

rapidsmpf::streaming::TableChunk::TableChunk ( std::unique_ptr< cudf::packed_columns >  packed_columns,
rmm::cuda_stream_view  stream 
)

Construct a TableChunk from packed columns.

Parameters
packed_columnsSerialized device table.
streamThe CUDA stream on which the packed_columns was created.

◆ TableChunk() [4/4]

rapidsmpf::streaming::TableChunk::TableChunk ( std::unique_ptr< PackedData packed_data)

Construct a TableChunk from a packed data blob.

The packed data's CUDA stream will be associated the new table chunk.

Parameters
packed_dataSerialized host/device data with metadata.

Member Function Documentation

◆ copy()

TableChunk rapidsmpf::streaming::TableChunk::copy ( MemoryReservation reservation) const

Create a deep copy of the table chunk.

Allocates new memory for all buffers in the table using the specified reservation, which determines the target memory type (e.g., host or device). As a consequence, the is_available() status may differ in the new copy. For example, copying an available table chunk from device to host memory will result in an unavailable copy.

Parameters
reservationMemory reservation used to track and limit allocations.
Returns
A new TableChunk instance containing copies of all buffers and metadata.
Exceptions
std::overflow_errorIf the total allocation size exceeds the available reservation.

◆ data_alloc_size()

std::size_t rapidsmpf::streaming::TableChunk::data_alloc_size ( MemoryType  mem_type) const

Number of bytes allocated for the data in the specified memory type.

Parameters
mem_typeThe memory type to query.
Returns
Number of bytes allocated.

◆ is_available()

bool rapidsmpf::streaming::TableChunk::is_available ( ) const
noexcept

Indicates whether the underlying cudf table data is fully available in device memory.

Returns
true if the table is already available; otherwise, false.

◆ is_spillable()

bool rapidsmpf::streaming::TableChunk::is_spillable ( ) const

Indicates whether this table chunk can be spilled to device memory.

A table chunk is considered spillable if it owns its underlying memory. This is true when it was created from one of the following:

  • A device-owning source such as a cudf::table, cudf::packed_columns, or PackedData.
  • A cudf::table_view constructed with is_exclusive_view == true, indicating that the view is the sole representation of the underlying data and that its owner exclusively manages the table's memory.

In contrast, chunks constructed from non-exclusive cudf::table_view instances are non-owning views of externally managed memory and therefore not spillable.

To spill a table chunk from device to host memory, first call copy() to create a host-side copy, then delete or overwrite the original device chunk. If is_spillable() == true, destroying the original device chunk will release the associated device memory.

Returns
true if the table chunk owns its memory and can be spilled; otherwise false.

◆ make_available()

TableChunk rapidsmpf::streaming::TableChunk::make_available ( MemoryReservation reservation)

Moves this table chunk into a new one with its cudf table made available.

As part of the move, a copy or unpack may be performed, the associated CUDA stream is used.

Parameters
reservationMemory reservation for allocations if needed.
Returns
A new TableChunk with data available on device.
Note
After this call, the current object is in a moved-from state; only reassignment, movement, or destruction are valid.

◆ make_available_cost()

std::size_t rapidsmpf::streaming::TableChunk::make_available_cost ( ) const
noexcept

Returns the estimated cost (in bytes) of making the table available.

Currently, only device memory cost is tracked.

Returns
The cost in bytes.

◆ operator=()

TableChunk& rapidsmpf::streaming::TableChunk::operator= ( TableChunk &&  )
default

Move assignment.

Returns
Moved this.

◆ stream()

rmm::cuda_stream_view rapidsmpf::streaming::TableChunk::stream ( ) const
noexcept

Returns the CUDA stream on which this table chunk was created.

Returns
The CUDA stream view.

◆ table_view()

cudf::table_view rapidsmpf::streaming::TableChunk::table_view ( ) const

Returns a view of the underlying table.

The table must be available in device memory.

Returns
cudf::table_view representing the table.
Exceptions
std::invalid_argumentif is_available() == false.

The documentation for this class was generated from the following file: