Classes | Typedefs | Enumerations | Functions | Variables
cudf::io Namespace Reference

IO interfaces. More...

Classes

class  arrow_io_source
 Implementation class for reading from an Apache Arrow file. The file could be a memory-mapped file or other implementation supported by Arrow. More...
 
class  avro_reader_options
 Settings to use for read_avro(). More...
 
class  avro_reader_options_builder
 Builder to build options for read_avro(). More...
 
struct  bucket_statistics
 Statistics for boolean columns. More...
 
class  chunked_orc_writer_options
 Settings to use for write_orc_chunked(). More...
 
class  chunked_orc_writer_options_builder
 Builds settings to use for write_orc_chunked(). More...
 
class  chunked_parquet_reader
 The chunked parquet reader class to read Parquet file iteratively in to a series of tables, chunk by chunk. More...
 
class  chunked_parquet_writer_options
 Settings for write_parquet_chunked(). More...
 
class  chunked_parquet_writer_options_builder
 Builds options for chunked_parquet_writer_options. More...
 
class  column_in_metadata
 Metadata for a column. More...
 
struct  column_name_info
 Detailed name information for output columns. More...
 
struct  column_statistics
 Contains per-column ORC statistics. More...
 
class  csv_reader_options
 Settings to use for read_csv(). More...
 
class  csv_reader_options_builder
 Builder to build options for read_csv(). More...
 
class  csv_writer_options
 Settings to use for write_csv(). More...
 
class  csv_writer_options_builder
 Builder to build options for writer_csv() More...
 
class  data_sink
 Interface class for storing the output data from the writers. More...
 
class  datasource
 Interface class for providing input data to the readers. More...
 
struct  decimal_statistics
 Statistics for decimal columns. More...
 
struct  double_statistics
 Statistics for floating point columns. More...
 
struct  host_buffer
 Non-owning view of a host memory buffer. More...
 
struct  integer_statistics
 Statistics for integral columns. More...
 
class  json_reader_options
 Input arguments to the read_json interface. More...
 
class  json_reader_options_builder
 Builds settings to use for read_json(). More...
 
struct  minmax_statistics
 Base class for column statistics that include optional minimum and maximum. More...
 
class  orc_chunked_writer
 Chunked orc writer class writes an ORC file in a chunked/stream form. More...
 
struct  orc_column_schema
 Schema of an ORC column, including the nested columns. More...
 
class  orc_metadata
 Information about content of an ORC file. More...
 
class  orc_reader_options
 Settings to use for read_orc(). More...
 
class  orc_reader_options_builder
 Builds settings to use for read_orc(). More...
 
struct  orc_schema
 Schema of an ORC file. More...
 
class  orc_writer_options
 Settings to use for write_orc(). More...
 
class  orc_writer_options_builder
 Builds settings to use for write_orc(). More...
 
class  parquet_chunked_writer
 chunked parquet writer class to handle options and write tables in chunks. More...
 
class  parquet_reader_options
 Settings for read_parquet(). More...
 
class  parquet_reader_options_builder
 Builds parquet_reader_options to use for read_parquet(). More...
 
class  parquet_writer_options
 Settings for write_parquet(). More...
 
class  parquet_writer_options_builder
 Class to build parquet_writer_options. More...
 
struct  parsed_orc_statistics
 Holds column names and parsed file-level and stripe-level statistics. More...
 
struct  partition_info
 Information used while writing partitioned datasets. More...
 
struct  raw_orc_statistics
 Holds column names and buffers containing raw file-level and stripe-level statistics. More...
 
class  reader_column_schema
 schema element for reader More...
 
struct  schema_element
 Allows specifying the target types for nested JSON data via json_reader_options' set_dtypes method. More...
 
struct  sink_info
 Destination information for write interfaces. More...
 
struct  source_info
 Source information for read interfaces. More...
 
struct  string_statistics
 Statistics for string columns. More...
 
struct  sum_statistics
 Base class for column statistics that include an optional sum. More...
 
class  table_input_metadata
 Metadata for a table. More...
 
struct  table_metadata
 Table metadata for io readers/writers (primarily column names) More...
 
struct  table_with_metadata
 Table with table metadata used by io readers to return the metadata by value. More...
 
struct  timestamp_statistics
 Statistics for timestamp columns. More...
 

Typedefs

using no_statistics = std::monostate
 Monostate type alias for the statistics variant.
 
using date_statistics = minmax_statistics< int32_t >
 Statistics for date(time) columns.
 
using binary_statistics = sum_statistics< int64_t >
 Statistics for binary columns. More...
 

Enumerations

enum  compression_type {
  compression_type::NONE, compression_type::AUTO, compression_type::SNAPPY, compression_type::GZIP,
  compression_type::BZIP2, compression_type::BROTLI, compression_type::ZIP, compression_type::XZ,
  compression_type::ZLIB, compression_type::LZ4, compression_type::LZO, compression_type::ZSTD
}
 Compression algorithms. More...
 
enum  io_type { io_type::FILEPATH, io_type::HOST_BUFFER, io_type::VOID, io_type::USER_IMPLEMENTED }
 Data source or destination types. More...
 
enum  quote_style { quote_style::MINIMAL, quote_style::ALL, quote_style::NONNUMERIC, quote_style::NONE }
 Behavior when handling quotations in field data. More...
 
enum  statistics_freq { STATISTICS_NONE = 0, STATISTICS_ROWGROUP = 1, STATISTICS_PAGE = 2, STATISTICS_COLUMN = 3 }
 Column statistics granularity type for parquet/orc writers. More...
 

Functions

table_with_metadata read_avro (avro_reader_options const &options, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Reads an Avro dataset into a set of columns. More...
 
table_with_metadata read_csv (csv_reader_options options, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Reads a CSV dataset into a set of columns. More...
 
void write_csv (csv_writer_options const &options, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Writes a set of columns to CSV format. More...
 
table_with_metadata read_json (json_reader_options options, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Reads a JSON dataset into a set of columns. More...
 
table_with_metadata read_orc (orc_reader_options const &options, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Reads an ORC dataset into a set of columns. More...
 
void write_orc (orc_writer_options const &options, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Writes a set of columns to ORC format. More...
 
raw_orc_statistics read_raw_orc_statistics (source_info const &src_info)
 Reads file-level and stripe-level statistics of ORC dataset. More...
 
parsed_orc_statistics read_parsed_orc_statistics (source_info const &src_info)
 Reads file-level and stripe-level statistics of ORC dataset. More...
 
orc_metadata read_orc_metadata (source_info const &src_info)
 Reads file-level and stripe-level statistics of ORC dataset. More...
 
table_with_metadata read_parquet (parquet_reader_options const &options, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Reads a Parquet dataset into a set of columns. More...
 
std::unique_ptr< std::vector< uint8_t > > write_parquet (parquet_writer_options const &options, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Writes a set of columns to parquet format. More...
 
std::unique_ptr< std::vector< uint8_t > > merge_row_group_metadata (const std::vector< std::unique_ptr< std::vector< uint8_t >>> &metadata_list)
 Merges multiple raw metadata blobs that were previously created by write_parquet into a single metadata blob. More...
 

Variables

constexpr size_t default_stripe_size_bytes = 64 * 1024 * 1024
 64MB default orc stripe size
 
constexpr size_type default_stripe_size_rows = 1000000
 1M rows default orc stripe rows
 
constexpr size_type default_row_index_stride = 10000
 10K rows default orc row index stride
 
constexpr size_t default_row_group_size_bytes = 128 * 1024 * 1024
 128MB per row group
 
constexpr size_type default_row_group_size_rows = 1000000
 1 million rows per row group
 
constexpr size_t default_max_page_size_bytes = 512 * 1024
 512KB per page
 
constexpr size_type default_max_page_size_rows = 20000
 20k rows per page
 
constexpr size_type default_column_index_truncate_length = 64
 truncate to 64 bytes
 

Detailed Description

IO interfaces.

Typedef Documentation

◆ binary_statistics

using cudf::io::binary_statistics = typedef sum_statistics<int64_t>

Statistics for binary columns.

The sum is the total number of bytes across all elements.

Definition at line 139 of file orc_metadata.hpp.

Enumeration Type Documentation

◆ compression_type

Compression algorithms.

Enumerator
NONE 

No compression.

AUTO 

Automatically detect or select compression format.

SNAPPY 

Snappy format, using byte-oriented LZ77.

GZIP 

GZIP format, using DEFLATE algorithm.

BZIP2 

BZIP2 format, using Burrows-Wheeler transform.

BROTLI 

BROTLI format, using LZ77 + Huffman + 2nd order context modeling.

ZIP 

ZIP format, using DEFLATE algorithm.

XZ 

XZ format, using LZMA(2) algorithm.

ZLIB 

ZLIB format, using DEFLATE algorithm.

LZ4 

LZ4 format, using LZ77.

LZO 

Lempel–Ziv–Oberhumer format.

ZSTD 

Zstandard format.

Definition at line 57 of file io/types.hpp.

◆ io_type

enum cudf::io::io_type
strong

Data source or destination types.

Enumerator
FILEPATH 

Input/output is a file path.

HOST_BUFFER 

Input/output is a buffer in host memory.

VOID 

Input/output is nothing. No work is done. Useful for benchmarking.

USER_IMPLEMENTED 

Input/output is handled by a custom user class.

Definition at line 75 of file io/types.hpp.

◆ quote_style

enum cudf::io::quote_style
strong

Behavior when handling quotations in field data.

Enumerator
MINIMAL 

Quote only fields which contain special characters.

ALL 

Quote all fields.

NONNUMERIC 

Quote all non-numeric fields.

NONE 

Never quote fields; disable quotation parsing.

Definition at line 85 of file io/types.hpp.

◆ statistics_freq

Column statistics granularity type for parquet/orc writers.

Enumerator
STATISTICS_NONE 

No column statistics.

STATISTICS_ROWGROUP 

Per-Rowgroup column statistics.

STATISTICS_PAGE 

Per-page column statistics.

STATISTICS_COLUMN 

Full column and offset indices. Implies STATISTICS_ROWGROUP.

Definition at line 95 of file io/types.hpp.