Files | |
file | orc_metadata.hpp |
cuDF-IO freeform API | |
file | orc_types.hpp |
file | parquet_metadata.hpp |
cuDF-IO freeform API | |
file | parquet_schema.hpp |
Parquet footer schema structs. | |
file | io/types.hpp |
cuDF-IO API type definitions | |
Classes | |
struct | cudf::io::raw_orc_statistics |
Holds column names and buffers containing raw file-level and stripe-level statistics. More... | |
struct | cudf::io::minmax_statistics< T > |
Base class for column statistics that include optional minimum and maximum. More... | |
struct | cudf::io::sum_statistics< T > |
Base class for column statistics that include an optional sum. More... | |
struct | cudf::io::integer_statistics |
Statistics for integral columns. More... | |
struct | cudf::io::double_statistics |
Statistics for floating point columns. More... | |
struct | cudf::io::string_statistics |
Statistics for string columns. More... | |
struct | cudf::io::bucket_statistics |
Statistics for boolean columns. More... | |
struct | cudf::io::decimal_statistics |
Statistics for decimal columns. More... | |
struct | cudf::io::timestamp_statistics |
Statistics for timestamp columns. More... | |
struct | cudf::io::column_statistics |
Contains per-column ORC statistics. More... | |
struct | cudf::io::parsed_orc_statistics |
Holds column names and parsed file-level and stripe-level statistics. More... | |
struct | cudf::io::orc_column_schema |
Schema of an ORC column, including the nested columns. More... | |
struct | cudf::io::orc_schema |
Schema of an ORC file. More... | |
class | cudf::io::orc_metadata |
Information about content of an ORC file. More... | |
struct | cudf::io::parquet_column_schema |
Schema of a parquet column, including the nested columns. More... | |
struct | cudf::io::parquet_schema |
Schema of a parquet file. More... | |
class | cudf::io::parquet_metadata |
Information about content of a parquet file. More... | |
struct | cudf::io::parquet::file_header_s |
Struct that describes the Parquet file data header. More... | |
struct | cudf::io::parquet::file_ender_s |
Struct that describes the Parquet file data postscript. More... | |
struct | cudf::io::parquet::DecimalType |
Struct that describes the decimal logical type annotation. More... | |
struct | cudf::io::parquet::TimeUnit |
Time units for temporal logical types. More... | |
struct | cudf::io::parquet::TimeType |
Struct that describes the time logical type annotation. More... | |
struct | cudf::io::parquet::TimestampType |
Struct that describes the timestamp logical type annotation. More... | |
struct | cudf::io::parquet::IntType |
Struct that describes the integer logical type annotation. More... | |
struct | cudf::io::parquet::LogicalType |
Struct that describes the logical type annotation. More... | |
struct | cudf::io::parquet::ColumnOrder |
Union to specify the order used for the min_value and max_value fields for a column. More... | |
struct | cudf::io::parquet::SchemaElement |
Struct for describing an element/field in the Parquet format schema. More... | |
struct | cudf::io::parquet::Statistics |
Thrift-derived struct describing column chunk statistics. More... | |
struct | cudf::io::parquet::SizeStatistics |
Thrift-derived struct containing statistics used to estimate page and column chunk sizes. More... | |
struct | cudf::io::parquet::PageLocation |
Thrift-derived struct describing page location information stored in the offsets index. More... | |
struct | cudf::io::parquet::OffsetIndex |
Thrift-derived struct describing the offset index. More... | |
struct | cudf::io::parquet::ColumnIndex |
Thrift-derived struct describing the column index. More... | |
struct | cudf::io::parquet::PageEncodingStats |
Thrift-derived struct describing page encoding statistics. More... | |
struct | cudf::io::parquet::SortingColumn |
Thrift-derived struct describing column sort order. More... | |
struct | cudf::io::parquet::ColumnChunkMetaData |
Thrift-derived struct describing a column chunk. More... | |
struct | cudf::io::parquet::BloomFilterAlgorithm |
The algorithm used in bloom filter. More... | |
struct | cudf::io::parquet::BloomFilterHash |
The hash function used in Bloom filter. More... | |
struct | cudf::io::parquet::BloomFilterCompression |
The compression used in the bloom filter. More... | |
struct | cudf::io::parquet::BloomFilterHeader |
Bloom filter header struct. More... | |
struct | cudf::io::parquet::ColumnChunk |
Thrift-derived struct describing a chunk of data for a particular column. More... | |
struct | cudf::io::parquet::RowGroup |
Thrift-derived struct describing a group of row data. More... | |
struct | cudf::io::parquet::KeyValue |
Thrift-derived struct describing a key-value pair, for user metadata. More... | |
struct | cudf::io::parquet::FileMetaData |
Thrift-derived struct describing file-level metadata. More... | |
struct | cudf::io::parquet::DataPageHeader |
Thrift-derived struct describing the header for a data page. More... | |
struct | cudf::io::parquet::DataPageHeaderV2 |
Thrift-derived struct describing the header for a V2 data page. More... | |
struct | cudf::io::parquet::DictionaryPageHeader |
Thrift-derived struct describing the header for a dictionary page. More... | |
struct | cudf::io::parquet::PageHeader |
Thrift-derived struct describing the page header. More... | |
class | cudf::io::writer_compression_statistics |
Statistics about compression performed by a writer. More... | |
struct | cudf::io::column_name_info |
Detailed name (and optionally nullability) information for output columns. More... | |
struct | cudf::io::table_metadata |
Table metadata returned by IO readers. More... | |
struct | cudf::io::table_with_metadata |
Table with table metadata used by io readers to return the metadata by value. More... | |
struct | cudf::io::source_info |
Source information for read interfaces. More... | |
struct | cudf::io::sink_info |
Destination information for write interfaces. More... | |
class | cudf::io::column_in_metadata |
Metadata for a column. More... | |
class | cudf::io::table_input_metadata |
Metadata for a table. More... | |
struct | cudf::io::partition_info |
Information used while writing partitioned datasets. More... | |
class | cudf::io::reader_column_schema |
schema element for reader More... | |
Typedefs | |
using | cudf::io::no_statistics = std::monostate |
Monostate type alias for the statistics variant. | |
using | cudf::io::date_statistics = minmax_statistics< int32_t > |
Statistics for date(time) columns. | |
using | cudf::io::binary_statistics = sum_statistics< int64_t > |
Statistics for binary columns. More... | |
using | cudf::io::statistics_type = std::variant< no_statistics, integer_statistics, double_statistics, string_statistics, bucket_statistics, decimal_statistics, date_statistics, binary_statistics, timestamp_statistics > |
Variant type for ORC type-specific column statistics. More... | |
Enumerations | |
enum | cudf::io::orc::CompressionKind : uint8_t { NONE = 0 , ZLIB = 1 , SNAPPY = 2 , LZO = 3 , LZ4 = 4 , ZSTD = 5 } |
Identifies a compression algorithm. | |
enum | cudf::io::orc::TypeKind : int8_t { INVALID_TYPE_KIND = -1 , BOOLEAN = 0 , BYTE = 1 , SHORT = 2 , INT = 3 , LONG = 4 , FLOAT = 5 , DOUBLE = 6 , STRING = 7 , BINARY = 8 , TIMESTAMP = 9 , LIST = 10 , MAP = 11 , STRUCT = 12 , UNION = 13 , DECIMAL = 14 , DATE = 15 , VARCHAR = 16 , CHAR = 17 } |
Identifies a data type in an orc file. | |
enum | cudf::io::orc::StreamKind : int8_t { INVALID_STREAM_KIND = -1 , PRESENT = 0 , DATA = 1 , LENGTH = 2 , DICTIONARY_DATA = 3 , DICTIONARY_COUNT = 4 , SECONDARY = 5 , ROW_INDEX = 6 , BLOOM_FILTER = 7 , BLOOM_FILTER_UTF8 = 8 } |
Identifies the type of data stream. | |
enum | cudf::io::orc::ColumnEncodingKind : int8_t { INVALID_ENCODING_KIND = -1 , DIRECT = 0 , DICTIONARY = 1 , DIRECT_V2 = 2 , DICTIONARY_V2 = 3 } |
Identifies the encoding of columns. | |
enum | cudf::io::orc::ProtofType : uint8_t { VARINT = 0 , FIXED64 = 1 , FIXEDLEN = 2 , START_GROUP = 3 , END_GROUP = 4 , FIXED32 = 5 , INVALID_6 = 6 , INVALID_7 = 7 } |
Identifies the type of encoding in a protocol buffer. | |
enum class | cudf::io::parquet::Type : int8_t { UNDEFINED = -1 , BOOLEAN = 0 , INT32 = 1 , INT64 = 2 , INT96 = 3 , FLOAT = 4 , DOUBLE = 5 , BYTE_ARRAY = 6 , FIXED_LEN_BYTE_ARRAY = 7 } |
Basic data types in Parquet, determines how data is physically stored. | |
enum class | cudf::io::parquet::ConvertedType : int8_t { UNKNOWN = -1 , UTF8 = 0 , MAP = 1 , MAP_KEY_VALUE = 2 , LIST , ENUM = 4 , DECIMAL = 5 , DATE = 6 , TIME_MILLIS = 7 , TIME_MICROS = 8 , TIMESTAMP_MILLIS = 9 , TIMESTAMP_MICROS = 10 , UINT_8 = 11 , UINT_16 = 12 , UINT_32 = 13 , UINT_64 = 14 , INT_8 = 15 , INT_16 = 16 , INT_32 = 17 , INT_64 = 18 , JSON = 19 , BSON = 20 , INTERVAL = 21 , NA = 25 } |
High-level data types in Parquet, determines how data is logically interpreted. | |
enum class | cudf::io::parquet::Encoding : uint8_t { PLAIN = 0 , GROUP_VAR_INT = 1 , PLAIN_DICTIONARY = 2 , RLE = 3 , BIT_PACKED = 4 , DELTA_BINARY_PACKED = 5 , DELTA_LENGTH_BYTE_ARRAY = 6 , DELTA_BYTE_ARRAY = 7 , RLE_DICTIONARY = 8 , BYTE_STREAM_SPLIT = 9 , NUM_ENCODINGS = 10 } |
Encoding types for the actual data stream. | |
enum class | cudf::io::parquet::Compression : uint8_t { UNCOMPRESSED = 0 , SNAPPY = 1 , GZIP = 2 , LZO = 3 , BROTLI = 4 , LZ4 = 5 , ZSTD = 6 , LZ4_RAW = 7 } |
Compression codec used for compressed data pages. | |
enum class | cudf::io::parquet::FieldRepetitionType : int8_t { UNSPECIFIED = -1 , REQUIRED = 0 , OPTIONAL = 1 , REPEATED = 2 } |
Compression codec used for compressed data pages. | |
enum class | cudf::io::parquet::PageType : uint8_t { DATA_PAGE = 0 , INDEX_PAGE = 1 , DICTIONARY_PAGE = 2 , DATA_PAGE_V2 = 3 } |
Types of pages. | |
enum class | cudf::io::parquet::BoundaryOrder : uint8_t { UNORDERED = 0 , ASCENDING = 1 , DESCENDING = 2 } |
Enum to annotate whether lists of min/max elements inside ColumnIndex are ordered and if so, in which direction. | |
enum class | cudf::io::parquet::FieldType : uint8_t { BOOLEAN_TRUE = 1 , BOOLEAN_FALSE = 2 , I8 = 3 , I16 = 4 , I32 = 5 , I64 = 6 , DOUBLE = 7 , BINARY = 8 , LIST = 9 , SET = 10 , MAP = 11 , STRUCT = 12 , UUID = 13 } |
Thrift compact protocol struct field types. | |
enum class | cudf::io::compression_type : int32_t { cudf::io::NONE , cudf::io::AUTO , cudf::io::SNAPPY , cudf::io::GZIP , cudf::io::BZIP2 , cudf::io::BROTLI , cudf::io::ZIP , cudf::io::XZ , cudf::io::ZLIB , cudf::io::LZ4 , cudf::io::LZO , cudf::io::ZSTD } |
Compression algorithms. More... | |
enum class | cudf::io::io_type : int32_t { cudf::io::FILEPATH , cudf::io::HOST_BUFFER , cudf::io::DEVICE_BUFFER , cudf::io::VOID , cudf::io::USER_IMPLEMENTED } |
Data source or destination types. More... | |
enum class | cudf::io::quote_style : int32_t { cudf::io::MINIMAL , cudf::io::ALL , cudf::io::NONNUMERIC , cudf::io::NONE } |
Behavior when handling quotations in field data. More... | |
enum | cudf::io::statistics_freq : int32_t { cudf::io::STATISTICS_NONE = 0 , cudf::io::STATISTICS_ROWGROUP = 1 , cudf::io::STATISTICS_PAGE = 2 , cudf::io::STATISTICS_COLUMN = 3 } |
Column statistics granularity type for parquet/orc writers. More... | |
enum class | cudf::io::column_encoding : int32_t { cudf::io::USE_DEFAULT = -1 , cudf::io::DICTIONARY , cudf::io::PLAIN , cudf::io::DELTA_BINARY_PACKED , cudf::io::DELTA_LENGTH_BYTE_ARRAY , cudf::io::DELTA_BYTE_ARRAY , cudf::io::BYTE_STREAM_SPLIT , cudf::io::DIRECT , cudf::io::DIRECT_V2 , cudf::io::DICTIONARY_V2 } |
Valid encodings for use with column_in_metadata::set_encoding() More... | |
enum | cudf::io::dictionary_policy : int32_t { cudf::io::NEVER = 0 , cudf::io::ADAPTIVE = 1 , cudf::io::ALWAYS = 2 } |
Control use of dictionary encoding for parquet writer. More... | |
Functions | |
template<typename T > | |
constexpr auto | cudf::io::is_byte_like_type () |
Returns true if the type is byte-like, meaning it is reasonable to pass as a pointer to bytes. More... | |
using cudf::io::binary_statistics = typedef sum_statistics<int64_t> |
Statistics for binary columns.
The sum
is the total number of bytes across all elements.
Definition at line 143 of file orc_metadata.hpp.
using cudf::io::statistics_type = typedef std::variant<no_statistics, integer_statistics, double_statistics, string_statistics, bucket_statistics, decimal_statistics, date_statistics, binary_statistics, timestamp_statistics> |
Variant type for ORC type-specific column statistics.
The variant can hold any of the supported column statistics types.
Definition at line 163 of file orc_metadata.hpp.
|
strong |
Valid encodings for use with column_in_metadata::set_encoding()
Definition at line 106 of file io/types.hpp.
|
strong |
Compression algorithms.
Definition at line 57 of file io/types.hpp.
enum cudf::io::dictionary_policy : int32_t |
Control use of dictionary encoding for parquet writer.
Enumerator | |
---|---|
NEVER | Never use dictionary encoding. |
ADAPTIVE | Use dictionary when it will not impact compression. |
ALWAYS | Use dictionary regardless of impact on compression. |
Definition at line 225 of file io/types.hpp.
|
strong |
Data source or destination types.
Definition at line 75 of file io/types.hpp.
|
strong |
Behavior when handling quotations in field data.
Enumerator | |
---|---|
MINIMAL | Quote only fields which contain special characters. |
ALL | Quote all fields. |
NONNUMERIC | Quote all non-numeric fields. |
NONE | Never quote fields; disable quotation parsing. |
Definition at line 86 of file io/types.hpp.
enum cudf::io::statistics_freq : int32_t |
Column statistics granularity type for parquet/orc writers.
Definition at line 96 of file io/types.hpp.
|
inlineconstexpr |
Returns true
if the type is byte-like, meaning it is reasonable to pass as a pointer to bytes.
T | The representation type |
true
if the type is considered a byte-like type Definition at line 316 of file io/types.hpp.