chunked parquet writer class to handle options and write tables in chunks. More...
#include <parquet.hpp>
Public Member Functions | |
parquet_chunked_writer () | |
Default constructor, this should never be used. This is added just to satisfy cython. This is added to not leak detail API. | |
parquet_chunked_writer (chunked_parquet_writer_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream()) | |
Constructor with chunked writer options. More... | |
~parquet_chunked_writer () | |
Default destructor. This is added to not leak detail API. | |
parquet_chunked_writer & | write (table_view const &table, std::vector< partition_info > const &partitions={}) |
Writes table to output. More... | |
std::unique_ptr< std::vector< uint8_t > > | close (std::vector< std::string > const &column_chunks_file_paths={}) |
Finishes the chunked/streamed write process. More... | |
Public Attributes | |
std::unique_ptr< parquet::detail::writer > | writer |
Unique pointer to impl writer class. | |
chunked parquet writer class to handle options and write tables in chunks.
The intent of the parquet_chunked_writer is to allow writing of an arbitrarily large / arbitrary number of rows to a parquet file in multiple passes.
The following code snippet demonstrates how to write a single parquet file containing one logical table by writing a series of individual cudf::tables.
Definition at line 1417 of file parquet.hpp.
cudf::io::parquet_chunked_writer::parquet_chunked_writer | ( | chunked_parquet_writer_options const & | options, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
) |
Constructor with chunked writer options.
[in] | options | options used to write table |
[in] | stream | CUDA stream used for device memory operations and kernel launches |
std::unique_ptr<std::vector<uint8_t> > cudf::io::parquet_chunked_writer::close | ( | std::vector< std::string > const & | column_chunks_file_paths = {} | ) |
Finishes the chunked/streamed write process.
[in] | column_chunks_file_paths | Column chunks file path to be set in the raw output metadata |
column_chunks_file_paths
is provided, else null. parquet_chunked_writer& cudf::io::parquet_chunked_writer::write | ( | table_view const & | table, |
std::vector< partition_info > const & | partitions = {} |
||
) |
Writes table to output.
[in] | table | Table that needs to be written |
[in] | partitions | Optional partitions to divide the table into. If specified, must be same size as number of sinks. |
cudf::logic_error | If the number of partitions is not the same as number of sinks |
rmm::bad_alloc | if there is insufficient space for temporary buffers |