Public Member Functions | Public Attributes | List of all members
cudf::io::chunked_parquet_writer Class Reference

chunked parquet writer class to handle options and write tables in chunks. More...

#include <parquet.hpp>

Public Member Functions

 chunked_parquet_writer ()
 Default constructor, this should never be used. This is added just to satisfy cython. This is added to not leak detail API.
 
 chunked_parquet_writer (chunked_parquet_writer_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream())
 Constructor with chunked writer options. More...
 
 ~chunked_parquet_writer ()
 Default destructor. This is added to not leak detail API.
 
chunked_parquet_writerwrite (table_view const &table, std::vector< partition_info > const &partitions={})
 Writes table to output. More...
 
std::unique_ptr< std::vector< uint8_t > > close (std::vector< std::string > const &column_chunks_file_path={})
 Finishes the chunked/streamed write process. More...
 

Public Attributes

std::unique_ptr< parquet::detail::writer > writer
 Unique pointer to impl writer class.
 

Detailed Description

chunked parquet writer class to handle options and write tables in chunks.

The intent of the chunked_parquet_writer is to allow writing of an arbitrarily large / arbitrary number of rows to a parquet file in multiple passes.

The following code snippet demonstrates how to write a single parquet file containing one logical table by writing a series of individual cudf::tables.

auto destination = cudf::io::sink_info("dataset.parquet");
auto options = cudf::io::chunked_parquet_writer_options::builder(destination, table->view());
writer.write(table0)
writer.write(table1)
writer.close()
static chunked_parquet_writer_options_builder builder(sink_info const &sink)
creates builder to build chunked_parquet_writer_options.
chunked parquet writer class to handle options and write tables in chunks.
Definition: parquet.hpp:1868
std::unique_ptr< parquet::detail::writer > writer
Unique pointer to impl writer class.
Definition: parquet.hpp:1921
Destination information for write interfaces.
Definition: io/types.hpp:471

Definition at line 1868 of file parquet.hpp.

Constructor & Destructor Documentation

◆ chunked_parquet_writer()

cudf::io::chunked_parquet_writer::chunked_parquet_writer ( chunked_parquet_writer_options const &  options,
rmm::cuda_stream_view  stream = cudf::get_default_stream() 
)

Constructor with chunked writer options.

Parameters
[in]optionsoptions used to write table
[in]streamCUDA stream used for device memory operations and kernel launches

Member Function Documentation

◆ close()

std::unique_ptr<std::vector<uint8_t> > cudf::io::chunked_parquet_writer::close ( std::vector< std::string > const &  column_chunks_file_path = {})

Finishes the chunked/streamed write process.

Parameters
[in]column_chunks_file_pathColumn chunks file path to be set in the raw output metadata
Returns
A parquet-compatible blob that contains the file header and footer metadata. If column_chunks_file_path is non-empty, the output metadata blob will also have row group file paths set.

◆ write()

chunked_parquet_writer& cudf::io::chunked_parquet_writer::write ( table_view const &  table,
std::vector< partition_info > const &  partitions = {} 
)

Writes table to output.

Note
If an exception is thrown during encoding or compression, the data from the failing call is not written to the sink. Data from previous successful calls is unaffected.
Parameters
[in]tableTable that needs to be written
[in]partitionsOptional partitions to divide the table into. If specified, must be same size as number of sinks.
Exceptions
cudf::logic_errorIf the number of partitions is not the same as number of sinks
rmm::bad_allocif there is insufficient space for temporary buffers
Returns
returns reference of the class object

The documentation for this class was generated from the following file: