Classes | Enumerations | Functions
cudf::io Namespace Reference

IO interfaces. More...

Classes

class  arrow_io_source
 Implementation class for reading from an Apache Arrow file. The file could be a memory-mapped file or other implementation supported by Arrow. More...
 
class  avro_reader_options
 Settings to use for read_avro(). More...
 
class  avro_reader_options_builder
 
struct  binary_statistics
 Statistics for binary columns. More...
 
struct  bucket_statistics
 Statistics for boolean columns. More...
 
class  chunked_orc_writer_options
 Settings to use for write_orc_chunked(). More...
 
class  chunked_orc_writer_options_builder
 
class  chunked_parquet_writer_options
 Settings for write_parquet_chunked(). More...
 
class  chunked_parquet_writer_options_builder
 
class  column_in_metadata
 
struct  column_name_info
 Detailed name information for output columns. More...
 
class  column_statistics
 Contains per-column ORC statistics. More...
 
class  csv_reader_options
 Settings to use for read_csv(). More...
 
class  csv_reader_options_builder
 
class  csv_writer_options
 Settings to use for write_csv(). More...
 
class  csv_writer_options_builder
 
class  data_sink
 Interface class for storing the output data from the writers. More...
 
class  datasource
 Interface class for providing input data to the readers. More...
 
struct  date_statistics
 Statistics for date(time) columns. More...
 
struct  decimal_statistics
 Statistics for decimal columns. More...
 
struct  double_statistics
 Statistics for floating point columns. More...
 
struct  host_buffer
 Non-owning view of a host memory buffer. More...
 
struct  integer_statistics
 Statistics for integral columns. More...
 
class  json_reader_options
 Input arguments to the read_json interface. More...
 
class  json_reader_options_builder
 
struct  minmax_statistics
 Base class for column statistics that include optional minimum and maximum. More...
 
class  orc_chunked_writer
 Chunked orc writer class writes an ORC file in a chunked/stream form. More...
 
class  orc_reader_options
 Settings to use for read_orc(). More...
 
class  orc_reader_options_builder
 
class  orc_writer_options
 Settings to use for write_orc(). More...
 
class  orc_writer_options_builder
 
class  parquet_chunked_writer
 chunked parquet writer class to handle options and write tables in chunks. More...
 
class  parquet_reader_options
 Settings or read_parquet(). More...
 
class  parquet_reader_options_builder
 
class  parquet_writer_options
 Settings for write_parquet(). More...
 
class  parquet_writer_options_builder
 
struct  parsed_orc_statistics
 Holds column names and parsed file-level and stripe-level statistics. More...
 
struct  raw_orc_statistics
 Holds column names and buffers containing raw file-level and stripe-level statistics. More...
 
struct  sink_info
 Destination information for write interfaces. More...
 
struct  source_info
 Source information for read interfaces. More...
 
struct  string_statistics
 Statistics for string columns. More...
 
struct  sum_statistics
 Base class for column statistics that include an optional sum. More...
 
class  table_input_metadata
 
struct  table_metadata
 Table metadata for io readers/writers (primarily column names) For nested types (structs, maps, unions), the ordering of names in the column_names vector corresponds to a pre-order traversal of the column tree. In the example below (2 top-level columns: struct column "col1" and string column "col2"), column_names = {"col1", "s3", "f5", "f6", "f4", "col2"}. More...
 
struct  table_metadata_with_nullability
 Derived class of table_metadata which includes flattened nullability information of input. More...
 
struct  table_with_metadata
 Table with table metadata used by io readers to return the metadata by value. More...
 
struct  timestamp_statistics
 Statistics for timestamp columns. More...
 

Enumerations

enum  statistics_type {
  NONE, INT, DOUBLE, STRING,
  BUCKET, DECIMAL, DATE, BINARY,
  TIMESTAMP
}
 Enumerator for types of column statistics that can be included in column_statistics. More...
 
enum  compression_type {
  compression_type::NONE, compression_type::AUTO, compression_type::SNAPPY, compression_type::GZIP,
  compression_type::BZIP2, compression_type::BROTLI, compression_type::ZIP, compression_type::XZ
}
 Compression algorithms. More...
 
enum  io_type { io_type::FILEPATH, io_type::HOST_BUFFER, io_type::VOID, io_type::USER_IMPLEMENTED }
 Data source or destination types. More...
 
enum  quote_style { quote_style::MINIMAL, quote_style::ALL, quote_style::NONNUMERIC, quote_style::NONE }
 Behavior when handling quotations in field data. More...
 
enum  statistics_freq { STATISTICS_NONE = 0, STATISTICS_ROWGROUP = 1, STATISTICS_PAGE = 2 }
 Column statistics granularity type for parquet/orc writers. More...
 

Functions

table_with_metadata read_avro (avro_reader_options const &options, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Reads an Avro dataset into a set of columns. More...
 
table_with_metadata read_csv (csv_reader_options const &options, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Reads a CSV dataset into a set of columns. More...
 
void write_csv (csv_writer_options const &options, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Writes a set of columns to CSV format. More...
 
table_with_metadata read_json (json_reader_options const &options, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Reads a JSON dataset into a set of columns. More...
 
table_with_metadata read_orc (orc_reader_options const &options, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Reads an ORC dataset into a set of columns. More...
 
void write_orc (orc_writer_options const &options, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Writes a set of columns to ORC format. More...
 
raw_orc_statistics read_raw_orc_statistics (source_info const &src_info)
 Reads file-level and stripe-level statistics of ORC dataset. More...
 
parsed_orc_statistics read_parsed_orc_statistics (source_info const &src_info)
 Reads file-level and stripe-level statistics of ORC dataset. More...
 
table_with_metadata read_parquet (parquet_reader_options const &options, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Reads a Parquet dataset into a set of columns. More...
 
std::unique_ptr< std::vector< uint8_t > > write_parquet (parquet_writer_options const &options, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Writes a set of columns to parquet format. More...
 
std::unique_ptr< std::vector< uint8_t > > merge_rowgroup_metadata (const std::vector< std::unique_ptr< std::vector< uint8_t >>> &metadata_list)
 Merges multiple raw metadata blobs that were previously created by write_parquet into a single metadata blob. More...
 

Detailed Description

IO interfaces.

Enumeration Type Documentation

◆ compression_type

Compression algorithms.

Enumerator
NONE 

No compression.

AUTO 

Automatically detect or select compression format.

SNAPPY 

Snappy format, using byte-oriented LZ77.

GZIP 

GZIP format, using DEFLATE algorithm.

BZIP2 

BZIP2 format, using Burrows-Wheeler transform.

BROTLI 

BROTLI format, using LZ77 + Huffman + 2nd order context modeling.

ZIP 

ZIP format, using DEFLATE algorithm.

XZ 

XZ format, using LZMA(2) algorithm.

Definition at line 53 of file io/types.hpp.

◆ io_type

enum cudf::io::io_type
strong

Data source or destination types.

Enumerator
FILEPATH 

Input/output is a file path.

HOST_BUFFER 

Input/output is a buffer in host memory.

VOID 

Input/output is nothing. No work is done. Useful for benchmarking.

USER_IMPLEMENTED 

Input/output is handled by a custom user class.

Definition at line 67 of file io/types.hpp.

◆ quote_style

enum cudf::io::quote_style
strong

Behavior when handling quotations in field data.

Enumerator
MINIMAL 

Quote only fields which contain special characters.

ALL 

Quote all fields.

NONNUMERIC 

Quote all non-numeric fields.

NONE 

Never quote fields; disable quotation parsing.

Definition at line 77 of file io/types.hpp.

◆ statistics_freq

Column statistics granularity type for parquet/orc writers.

Enumerator
STATISTICS_NONE 

No column statistics.

STATISTICS_ROWGROUP 

Per-Rowgroup column statistics.

STATISTICS_PAGE 

Per-page column statistics.

Definition at line 87 of file io/types.hpp.

◆ statistics_type

Enumerator for types of column statistics that can be included in column_statistics.

The statistics type depends on the column data type.

Definition at line 68 of file orc_metadata.hpp.