Files | Classes | Typedefs | Enumerations | Functions
IO Types

Files

 
file  orc_types.hpp
 
 
file  parquet_schema.hpp
 Parquet footer schema structs.
 
file  io/types.hpp
 cuDF-IO API type definitions
 

Classes

struct  cudf::io::raw_orc_statistics
 Holds column names and buffers containing raw file-level and stripe-level statistics. More...
 
struct  cudf::io::minmax_statistics< T >
 Base class for column statistics that include optional minimum and maximum. More...
 
struct  cudf::io::sum_statistics< T >
 Base class for column statistics that include an optional sum. More...
 
struct  cudf::io::integer_statistics
 Statistics for integral columns. More...
 
struct  cudf::io::double_statistics
 Statistics for floating point columns. More...
 
struct  cudf::io::string_statistics
 Statistics for string columns. More...
 
struct  cudf::io::bucket_statistics
 Statistics for boolean columns. More...
 
struct  cudf::io::decimal_statistics
 Statistics for decimal columns. More...
 
struct  cudf::io::timestamp_statistics
 Statistics for timestamp columns. More...
 
struct  cudf::io::column_statistics
 Contains per-column ORC statistics. More...
 
struct  cudf::io::parsed_orc_statistics
 Holds column names and parsed file-level and stripe-level statistics. More...
 
struct  cudf::io::orc_column_schema
 Schema of an ORC column, including the nested columns. More...
 
struct  cudf::io::orc_schema
 Schema of an ORC file. More...
 
class  cudf::io::orc_metadata
 Information about content of an ORC file. More...
 
struct  cudf::io::parquet_column_schema
 Schema of a parquet column, including the nested columns. More...
 
struct  cudf::io::parquet_schema
 Schema of a parquet file. More...
 
class  cudf::io::parquet_metadata
 Information about content of a parquet file. More...
 
struct  cudf::io::parquet::file_header_s
 Struct that describes the Parquet file data header. More...
 
struct  cudf::io::parquet::file_ender_s
 Struct that describes the Parquet file data postscript. More...
 
struct  cudf::io::parquet::DecimalType
 Struct that describes the decimal logical type annotation. More...
 
struct  cudf::io::parquet::TimeUnit
 Time units for temporal logical types. More...
 
struct  cudf::io::parquet::TimeType
 Struct that describes the time logical type annotation. More...
 
struct  cudf::io::parquet::TimestampType
 Struct that describes the timestamp logical type annotation. More...
 
struct  cudf::io::parquet::IntType
 Struct that describes the integer logical type annotation. More...
 
struct  cudf::io::parquet::LogicalType
 Struct that describes the logical type annotation. More...
 
struct  cudf::io::parquet::ColumnOrder
 Union to specify the order used for the min_value and max_value fields for a column. More...
 
struct  cudf::io::parquet::SchemaElement
 Struct for describing an element/field in the Parquet format schema. More...
 
struct  cudf::io::parquet::Statistics
 Thrift-derived struct describing column chunk statistics. More...
 
struct  cudf::io::parquet::SizeStatistics
 Thrift-derived struct containing statistics used to estimate page and column chunk sizes. More...
 
struct  cudf::io::parquet::PageLocation
 Thrift-derived struct describing page location information stored in the offsets index. More...
 
struct  cudf::io::parquet::OffsetIndex
 Thrift-derived struct describing the offset index. More...
 
struct  cudf::io::parquet::ColumnIndex
 Thrift-derived struct describing the column index. More...
 
struct  cudf::io::parquet::PageEncodingStats
 Thrift-derived struct describing page encoding statistics. More...
 
struct  cudf::io::parquet::SortingColumn
 Thrift-derived struct describing column sort order. More...
 
struct  cudf::io::parquet::ColumnChunkMetaData
 Thrift-derived struct describing a column chunk. More...
 
struct  cudf::io::parquet::BloomFilterAlgorithm
 The algorithm used in bloom filter. More...
 
struct  cudf::io::parquet::BloomFilterHash
 The hash function used in Bloom filter. More...
 
struct  cudf::io::parquet::BloomFilterCompression
 The compression used in the bloom filter. More...
 
struct  cudf::io::parquet::BloomFilterHeader
 Bloom filter header struct. More...
 
struct  cudf::io::parquet::ColumnChunk
 Thrift-derived struct describing a chunk of data for a particular column. More...
 
struct  cudf::io::parquet::RowGroup
 Thrift-derived struct describing a group of row data. More...
 
struct  cudf::io::parquet::KeyValue
 Thrift-derived struct describing a key-value pair, for user metadata. More...
 
struct  cudf::io::parquet::FileMetaData
 Thrift-derived struct describing file-level metadata. More...
 
struct  cudf::io::parquet::DataPageHeader
 Thrift-derived struct describing the header for a data page. More...
 
struct  cudf::io::parquet::DataPageHeaderV2
 Thrift-derived struct describing the header for a V2 data page. More...
 
struct  cudf::io::parquet::DictionaryPageHeader
 Thrift-derived struct describing the header for a dictionary page. More...
 
struct  cudf::io::parquet::PageHeader
 Thrift-derived struct describing the page header. More...
 
class  cudf::io::writer_compression_statistics
 Statistics about compression performed by a writer. More...
 
struct  cudf::io::column_name_info
 Detailed name (and optionally nullability) information for output columns. More...
 
struct  cudf::io::table_metadata
 Table metadata returned by IO readers. More...
 
struct  cudf::io::table_with_metadata
 Table with table metadata used by io readers to return the metadata by value. More...
 
struct  cudf::io::source_info
 Source information for read interfaces. More...
 
struct  cudf::io::sink_info
 Destination information for write interfaces. More...
 
class  cudf::io::column_in_metadata
 Metadata for a column. More...
 
class  cudf::io::table_input_metadata
 Metadata for a table. More...
 
struct  cudf::io::partition_info
 Information used while writing partitioned datasets. More...
 
class  cudf::io::reader_column_schema
 schema element for reader More...
 

Typedefs

using cudf::io::no_statistics = std::monostate
 Monostate type alias for the statistics variant.
 
using cudf::io::date_statistics = minmax_statistics< int32_t >
 Statistics for date(time) columns.
 
using cudf::io::binary_statistics = sum_statistics< int64_t >
 Statistics for binary columns. More...
 
using cudf::io::statistics_type = std::variant< no_statistics, integer_statistics, double_statistics, string_statistics, bucket_statistics, decimal_statistics, date_statistics, binary_statistics, timestamp_statistics >
 Variant type for ORC type-specific column statistics. More...
 

Enumerations

enum  cudf::io::orc::CompressionKind : uint8_t {
  NONE = 0 , ZLIB = 1 , SNAPPY = 2 , LZO = 3 ,
  LZ4 = 4 , ZSTD = 5
}
 Identifies a compression algorithm.
 
enum  cudf::io::orc::TypeKind : int8_t {
  INVALID_TYPE_KIND = -1 , BOOLEAN = 0 , BYTE = 1 , SHORT = 2 ,
  INT = 3 , LONG = 4 , FLOAT = 5 , DOUBLE = 6 ,
  STRING = 7 , BINARY = 8 , TIMESTAMP = 9 , LIST = 10 ,
  MAP = 11 , STRUCT = 12 , UNION = 13 , DECIMAL = 14 ,
  DATE = 15 , VARCHAR = 16 , CHAR = 17
}
 Identifies a data type in an orc file.
 
enum  cudf::io::orc::StreamKind : int8_t {
  INVALID_STREAM_KIND = -1 , PRESENT = 0 , DATA = 1 , LENGTH = 2 ,
  DICTIONARY_DATA = 3 , DICTIONARY_COUNT = 4 , SECONDARY = 5 , ROW_INDEX = 6 ,
  BLOOM_FILTER = 7 , BLOOM_FILTER_UTF8 = 8
}
 Identifies the type of data stream.
 
enum  cudf::io::orc::ColumnEncodingKind : int8_t {
  INVALID_ENCODING_KIND = -1 , DIRECT = 0 , DICTIONARY = 1 , DIRECT_V2 = 2 ,
  DICTIONARY_V2 = 3
}
 Identifies the encoding of columns.
 
enum  cudf::io::orc::ProtofType : uint8_t {
  VARINT = 0 , FIXED64 = 1 , FIXEDLEN = 2 , START_GROUP = 3 ,
  END_GROUP = 4 , FIXED32 = 5 , INVALID_6 = 6 , INVALID_7 = 7
}
 Identifies the type of encoding in a protocol buffer.
 
enum class  cudf::io::parquet::Type : int8_t {
  UNDEFINED = -1 , BOOLEAN = 0 , INT32 = 1 , INT64 = 2 ,
  INT96 = 3 , FLOAT = 4 , DOUBLE = 5 , BYTE_ARRAY = 6 ,
  FIXED_LEN_BYTE_ARRAY = 7
}
 Basic data types in Parquet, determines how data is physically stored.
 
enum class  cudf::io::parquet::ConvertedType : int8_t {
  UNKNOWN = -1 , UTF8 = 0 , MAP = 1 , MAP_KEY_VALUE = 2 ,
  LIST , ENUM = 4 , DECIMAL = 5 , DATE = 6 ,
  TIME_MILLIS = 7 , TIME_MICROS = 8 , TIMESTAMP_MILLIS = 9 , TIMESTAMP_MICROS = 10 ,
  UINT_8 = 11 , UINT_16 = 12 , UINT_32 = 13 , UINT_64 = 14 ,
  INT_8 = 15 , INT_16 = 16 , INT_32 = 17 , INT_64 = 18 ,
  JSON = 19 , BSON = 20 , INTERVAL = 21 , NA = 25
}
 High-level data types in Parquet, determines how data is logically interpreted.
 
enum class  cudf::io::parquet::Encoding : uint8_t {
  PLAIN = 0 , GROUP_VAR_INT = 1 , PLAIN_DICTIONARY = 2 , RLE = 3 ,
  BIT_PACKED = 4 , DELTA_BINARY_PACKED = 5 , DELTA_LENGTH_BYTE_ARRAY = 6 , DELTA_BYTE_ARRAY = 7 ,
  RLE_DICTIONARY = 8 , BYTE_STREAM_SPLIT = 9 , NUM_ENCODINGS = 10
}
 Encoding types for the actual data stream.
 
enum class  cudf::io::parquet::Compression : uint8_t {
  UNCOMPRESSED = 0 , SNAPPY = 1 , GZIP = 2 , LZO = 3 ,
  BROTLI = 4 , LZ4 = 5 , ZSTD = 6 , LZ4_RAW = 7
}
 Compression codec used for compressed data pages.
 
enum class  cudf::io::parquet::FieldRepetitionType : int8_t { UNSPECIFIED = -1 , REQUIRED = 0 , OPTIONAL = 1 , REPEATED = 2 }
 Compression codec used for compressed data pages.
 
enum class  cudf::io::parquet::PageType : uint8_t { DATA_PAGE = 0 , INDEX_PAGE = 1 , DICTIONARY_PAGE = 2 , DATA_PAGE_V2 = 3 }
 Types of pages.
 
enum class  cudf::io::parquet::BoundaryOrder : uint8_t { UNORDERED = 0 , ASCENDING = 1 , DESCENDING = 2 }
 Enum to annotate whether lists of min/max elements inside ColumnIndex are ordered and if so, in which direction.
 
enum class  cudf::io::parquet::FieldType : uint8_t {
  BOOLEAN_TRUE = 1 , BOOLEAN_FALSE = 2 , I8 = 3 , I16 = 4 ,
  I32 = 5 , I64 = 6 , DOUBLE = 7 , BINARY = 8 ,
  LIST = 9 , SET = 10 , MAP = 11 , STRUCT = 12 ,
  UUID = 13
}
 Thrift compact protocol struct field types.
 
enum class  cudf::io::compression_type : int32_t {
  cudf::io::NONE , cudf::io::AUTO , cudf::io::SNAPPY , cudf::io::GZIP ,
  cudf::io::BZIP2 , cudf::io::BROTLI , cudf::io::ZIP , cudf::io::XZ ,
  cudf::io::ZLIB , cudf::io::LZ4 , cudf::io::LZO , cudf::io::ZSTD
}
 Compression algorithms. More...
 
enum class  cudf::io::io_type : int32_t {
  cudf::io::FILEPATH , cudf::io::HOST_BUFFER , cudf::io::DEVICE_BUFFER , cudf::io::VOID ,
  cudf::io::USER_IMPLEMENTED
}
 Data source or destination types. More...
 
enum class  cudf::io::quote_style : int32_t { cudf::io::MINIMAL , cudf::io::ALL , cudf::io::NONNUMERIC , cudf::io::NONE }
 Behavior when handling quotations in field data. More...
 
enum  cudf::io::statistics_freq : int32_t { cudf::io::STATISTICS_NONE = 0 , cudf::io::STATISTICS_ROWGROUP = 1 , cudf::io::STATISTICS_PAGE = 2 , cudf::io::STATISTICS_COLUMN = 3 }
 Column statistics granularity type for parquet/orc writers. More...
 
enum class  cudf::io::column_encoding : int32_t {
  cudf::io::USE_DEFAULT = -1 , cudf::io::DICTIONARY , cudf::io::PLAIN , cudf::io::DELTA_BINARY_PACKED ,
  cudf::io::DELTA_LENGTH_BYTE_ARRAY , cudf::io::DELTA_BYTE_ARRAY , cudf::io::BYTE_STREAM_SPLIT , cudf::io::DIRECT ,
  cudf::io::DIRECT_V2 , cudf::io::DICTIONARY_V2
}
 Valid encodings for use with column_in_metadata::set_encoding() More...
 
enum  cudf::io::dictionary_policy : int32_t { cudf::io::NEVER = 0 , cudf::io::ADAPTIVE = 1 , cudf::io::ALWAYS = 2 }
 Control use of dictionary encoding for parquet writer. More...
 

Functions

template<typename T >
constexpr auto cudf::io::is_byte_like_type ()
 Returns true if the type is byte-like, meaning it is reasonable to pass as a pointer to bytes. More...
 

Detailed Description

Typedef Documentation

◆ binary_statistics

using cudf::io::binary_statistics = typedef sum_statistics<int64_t>

Statistics for binary columns.

The sum is the total number of bytes across all elements.

Definition at line 143 of file orc_metadata.hpp.

◆ statistics_type

Variant type for ORC type-specific column statistics.

The variant can hold any of the supported column statistics types.

Definition at line 163 of file orc_metadata.hpp.

Enumeration Type Documentation

◆ column_encoding

enum cudf::io::column_encoding : int32_t
strong

Valid encodings for use with column_in_metadata::set_encoding()

Enumerator
USE_DEFAULT 

No encoding has been requested, use default encoding.

DICTIONARY 

Use dictionary encoding.

PLAIN 

Use plain encoding.

DELTA_BINARY_PACKED 

Use DELTA_BINARY_PACKED encoding (only valid for integer columns)

DELTA_LENGTH_BYTE_ARRAY 

Use DELTA_LENGTH_BYTE_ARRAY encoding (only valid for BYTE_ARRAY columns)

DELTA_BYTE_ARRAY 

Use DELTA_BYTE_ARRAY encoding (only valid for BYTE_ARRAY and FIXED_LEN_BYTE_ARRAY columns)

BYTE_STREAM_SPLIT 

Use BYTE_STREAM_SPLIT encoding (valid for all fixed width types)

DIRECT 

Use DIRECT encoding.

DIRECT_V2 

Use DIRECT_V2 encoding.

DICTIONARY_V2 

Use DICTIONARY_V2 encoding.

Definition at line 106 of file io/types.hpp.

◆ compression_type

enum cudf::io::compression_type : int32_t
strong

Compression algorithms.

Enumerator
NONE 

No compression.

AUTO 

Automatically detect or select compression format.

SNAPPY 

Snappy format, using byte-oriented LZ77.

GZIP 

GZIP format, using DEFLATE algorithm.

BZIP2 

BZIP2 format, using Burrows-Wheeler transform.

BROTLI 

BROTLI format, using LZ77 + Huffman + 2nd order context modeling.

ZIP 

ZIP format, using DEFLATE algorithm.

XZ 

XZ format, using LZMA(2) algorithm.

ZLIB 

ZLIB format, using DEFLATE algorithm.

LZ4 

LZ4 format, using LZ77.

LZO 

Lempel–Ziv–Oberhumer format.

ZSTD 

Zstandard format.

Definition at line 57 of file io/types.hpp.

◆ dictionary_policy

Control use of dictionary encoding for parquet writer.

Enumerator
NEVER 

Never use dictionary encoding.

ADAPTIVE 

Use dictionary when it will not impact compression.

ALWAYS 

Use dictionary regardless of impact on compression.

Definition at line 225 of file io/types.hpp.

◆ io_type

enum cudf::io::io_type : int32_t
strong

Data source or destination types.

Enumerator
FILEPATH 

Input/output is a file path.

HOST_BUFFER 

Input/output is a buffer in host memory.

DEVICE_BUFFER 

Input/output is a buffer in device memory.

VOID 

Input/output is nothing. No work is done. Useful for benchmarking.

USER_IMPLEMENTED 

Input/output is handled by a custom user class.

Definition at line 75 of file io/types.hpp.

◆ quote_style

enum cudf::io::quote_style : int32_t
strong

Behavior when handling quotations in field data.

Enumerator
MINIMAL 

Quote only fields which contain special characters.

ALL 

Quote all fields.

NONNUMERIC 

Quote all non-numeric fields.

NONE 

Never quote fields; disable quotation parsing.

Definition at line 86 of file io/types.hpp.

◆ statistics_freq

enum cudf::io::statistics_freq : int32_t

Column statistics granularity type for parquet/orc writers.

Enumerator
STATISTICS_NONE 

No column statistics.

STATISTICS_ROWGROUP 

Per-Rowgroup column statistics.

STATISTICS_PAGE 

Per-page column statistics.

STATISTICS_COLUMN 

Full column and offset indices. Implies STATISTICS_ROWGROUP.

Definition at line 96 of file io/types.hpp.

Function Documentation

◆ is_byte_like_type()

template<typename T >
constexpr auto cudf::io::is_byte_like_type ( )
inlineconstexpr

Returns true if the type is byte-like, meaning it is reasonable to pass as a pointer to bytes.

Template Parameters
TThe representation type
Returns
true if the type is considered a byte-like type

Definition at line 316 of file io/types.hpp.