IO interfaces. More...

Namespaces
	parquet
	Parquet I/O interfaces.

Classes
class	avro_reader_options
	Settings to use for `read_avro()`. More...

class	avro_reader_options_builder
	Builder to build options for `read_avro()`. More...

class	csv_reader_options
	Settings to use for `read_csv()`. More...

class	csv_reader_options_builder
	Builder to build options for `read_csv()`. More...

class	csv_writer_options
	Settings to use for `write_csv()`. More...

class	csv_writer_options_builder
	Builder to build options for `writer_csv()` More...

class	data_sink
	Interface class for storing the output data from the writers. More...

class	datasource
	Interface class for providing input data to the readers. More...

struct	schema_element
	Allows specifying the target types for nested JSON data via json_reader_options' `set_dtypes` method. More...

class	json_reader_options
	Input arguments to the `read_json` interface. More...

class	json_reader_options_builder
	Builds settings to use for `read_json()`. More...

class	json_writer_options
	Settings to use for `write_json()`. More...

class	json_writer_options_builder
	Builder to build options for `writer_json()` More...

class	orc_reader_options
	Settings to use for `read_orc()`. More...

class	orc_reader_options_builder
	Builds settings to use for `read_orc()`. More...

class	chunked_orc_reader
	The chunked orc reader class to read an ORC file iteratively into a series of tables, chunk by chunk. More...

class	orc_writer_options
	Settings to use for `write_orc()`. More...

class	orc_writer_options_builder
	Builds settings to use for `write_orc()`. More...

class	chunked_orc_writer_options
	Settings to use for `write_orc_chunked()`. More...

class	chunked_orc_writer_options_builder
	Builds settings to use for `write_orc_chunked()`. More...

class	orc_chunked_writer
	Chunked orc writer class writes an ORC file in a chunked/stream form. More...

struct	raw_orc_statistics
	Holds column names and buffers containing raw file-level and stripe-level statistics. More...

struct	minmax_statistics
	Base class for column statistics that include optional minimum and maximum. More...

struct	sum_statistics
	Base class for column statistics that include an optional sum. More...

struct	integer_statistics
	Statistics for integral columns. More...

struct	double_statistics
	Statistics for floating point columns. More...

struct	string_statistics
	Statistics for string columns. More...

struct	bucket_statistics
	Statistics for boolean columns. More...

struct	decimal_statistics
	Statistics for decimal columns. More...

struct	timestamp_statistics
	Statistics for timestamp columns. More...

struct	column_statistics
	Contains per-column ORC statistics. More...

struct	parsed_orc_statistics
	Holds column names and parsed file-level and stripe-level statistics. More...

struct	orc_column_schema
	Schema of an ORC column, including the nested columns. More...

struct	orc_schema
	Schema of an ORC file. More...

class	orc_metadata
	Information about content of an ORC file. More...

class	parquet_reader_options
	Settings for `read_parquet()`. More...

class	parquet_reader_options_builder
	Builds parquet_reader_options to use for `read_parquet()`. More...

class	chunked_parquet_reader
	The chunked parquet reader class to read Parquet file iteratively in to a series of tables, chunk by chunk. More...

struct	sorting_column
	Struct used to describe column sorting metadata. More...

class	parquet_writer_options_base
	Base settings for `write_parquet()` and `parquet_chunked_writer`. More...

class	parquet_writer_options_builder_base
	Base class for Parquet options builders. More...

class	parquet_writer_options
	Settings for `write_parquet()`. More...

class	parquet_writer_options_builder
	Class to build `parquet_writer_options`. More...

class	chunked_parquet_writer_options
	Settings for `parquet_chunked_writer`. More...

class	chunked_parquet_writer_options_builder
	Class to build `chunked_parquet_writer_options`. More...

class	parquet_chunked_writer
	chunked parquet writer class to handle options and write tables in chunks. More...

struct	parquet_column_schema
	Schema of a parquet column, including the nested columns. More...

struct	parquet_schema
	Schema of a parquet file. More...

class	parquet_metadata
	Information about content of a parquet file. More...

class	writer_compression_statistics
	Statistics about compression performed by a writer. More...

struct	column_name_info
	Detailed name (and optionally nullability) information for output columns. More...

struct	table_metadata
	Table metadata returned by IO readers. More...

struct	table_with_metadata
	Table with table metadata used by io readers to return the metadata by value. More...

struct	host_buffer
	Non-owning view of a host memory buffer. More...

struct	source_info
	Source information for read interfaces. More...

struct	sink_info
	Destination information for write interfaces. More...

class	column_in_metadata
	Metadata for a column. More...

class	table_input_metadata
	Metadata for a table. More...

struct	partition_info
	Information used while writing partitioned datasets. More...

class	reader_column_schema
	schema element for reader More...

Typedefs
using	no_statistics = std::monostate
	Monostate type alias for the statistics variant.

using	date_statistics = minmax_statistics< int32_t >
	Statistics for date(time) columns.

using	binary_statistics = sum_statistics< int64_t >
	Statistics for binary columns. More...

using	statistics_type = std::variant< no_statistics, integer_statistics, double_statistics, string_statistics, bucket_statistics, decimal_statistics, date_statistics, binary_statistics, timestamp_statistics >
	Variant type for ORC type-specific column statistics. More...

Enumerations
enum class	json_recovery_mode_t { FAIL , RECOVER_WITH_NULL }
	Control the error recovery behavior of the json parser. More...

enum class	compression_type : int32_t { NONE , AUTO , SNAPPY , GZIP , BZIP2 , BROTLI , ZIP , XZ , ZLIB , LZ4 , LZO , ZSTD }
	Compression algorithms. More...

enum class	io_type : int32_t { FILEPATH , HOST_BUFFER , DEVICE_BUFFER , VOID , USER_IMPLEMENTED }
	Data source or destination types. More...

enum class	quote_style : int32_t { MINIMAL , ALL , NONNUMERIC , NONE }
	Behavior when handling quotations in field data. More...

enum	statistics_freq : int32_t { STATISTICS_NONE = 0 , STATISTICS_ROWGROUP = 1 , STATISTICS_PAGE = 2 , STATISTICS_COLUMN = 3 }
	Column statistics granularity type for parquet/orc writers. More...

enum class	column_encoding : int32_t { USE_DEFAULT = -1 , DICTIONARY , PLAIN , DELTA_BINARY_PACKED , DELTA_LENGTH_BYTE_ARRAY , DELTA_BYTE_ARRAY , BYTE_STREAM_SPLIT , DIRECT , DIRECT_V2 , DICTIONARY_V2 }
	Valid encodings for use with `column_in_metadata::set_encoding()` More...

enum	dictionary_policy : int32_t { NEVER = 0 , ADAPTIVE = 1 , ALWAYS = 2 }
	Control use of dictionary encoding for parquet writer. More...

Functions
table_with_metadata	read_avro (avro_reader_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
	Reads an Avro dataset into a set of columns. More...

table_with_metadata	read_csv (csv_reader_options options, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
	Reads a CSV dataset into a set of columns. More...

void	write_csv (csv_writer_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream())
	Writes a set of columns to CSV format. More...

table_with_metadata	read_json (json_reader_options options, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
	Reads a JSON dataset into a set of columns. More...

void	write_json (json_writer_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream())
	Writes a set of columns to JSON format. More...

table_with_metadata	read_orc (orc_reader_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
	Reads an ORC dataset into a set of columns. More...

void	write_orc (orc_writer_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream())
	Writes a set of columns to ORC format. More...

raw_orc_statistics	read_raw_orc_statistics (source_info const &src_info, rmm::cuda_stream_view stream=cudf::get_default_stream())
	Reads file-level and stripe-level statistics of ORC dataset. More...

parsed_orc_statistics	read_parsed_orc_statistics (source_info const &src_info, rmm::cuda_stream_view stream=cudf::get_default_stream())
	Reads file-level and stripe-level statistics of ORC dataset. More...

orc_metadata	read_orc_metadata (source_info const &src_info, rmm::cuda_stream_view stream=cudf::get_default_stream())
	Reads metadata of ORC dataset. More...

table_with_metadata	read_parquet (parquet_reader_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
	Reads a Parquet dataset into a set of columns. More...

std::unique_ptr< std::vector< uint8_t > >	write_parquet (parquet_writer_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream())
	Writes a set of columns to parquet format. More...

std::unique_ptr< std::vector< uint8_t > >	merge_row_group_metadata (std::vector< std::unique_ptr< std::vector< uint8_t >>> const &metadata_list)
	Merges multiple raw metadata blobs that were previously created by write_parquet into a single metadata blob. More...

parquet_metadata	read_parquet_metadata (source_info const &src_info)
	Reads metadata of parquet dataset. More...

template<typename T >
constexpr auto	is_byte_like_type ()
	Returns `true` if the type is byte-like, meaning it is reasonable to pass as a pointer to bytes. More...

Variables
constexpr size_t	default_stripe_size_bytes = 64 * 1024 * 1024
	64MB default orc stripe size

constexpr size_type	default_stripe_size_rows = 1000000
	1M rows default orc stripe rows

constexpr size_type	default_row_index_stride = 10000
	10K rows default orc row index stride

constexpr size_t	default_row_group_size_bytes
	Infinite bytes per row group. More...

constexpr size_type	default_row_group_size_rows = 1'000'000
	1 million rows per row group

constexpr size_t	default_max_page_size_bytes = 512 * 1024
	512KB per page

constexpr size_type	default_max_page_size_rows = 20000
	20k rows per page

constexpr int32_t	default_column_index_truncate_length = 64
	truncate to 64 bytes

constexpr size_t	default_max_dictionary_size = 1024 * 1024
	1MB dictionary size

constexpr size_type	default_max_page_fragment_size = 5000
	5000 rows per page fragment

Detailed Description

IO interfaces.

Namespaces

Classes

Typedefs

Enumerations

Functions

Variables

Detailed Description