Io Types#

group io_types


using no_statistics = std::monostate#

Monostate type alias for the statistics variant.

using date_statistics = minmax_statistics<int32_t>#

Statistics for date(time) columns.

using binary_statistics = sum_statistics<int64_t>#

Statistics for binary columns.

The sum is the total number of bytes across all elements.

using statistics_type = std::variant<no_statistics, integer_statistics, double_statistics, string_statistics, bucket_statistics, decimal_statistics, date_statistics, binary_statistics, timestamp_statistics>#

Variant type for ORC type-specific column statistics.

The variant can hold any of the supported column statistics types.


enum CompressionKind#

Identifies a compression algorithm.


enumerator NONE#
enumerator ZLIB#
enumerator SNAPPY#
enumerator LZO#
enumerator LZ4#
enumerator ZSTD#
enum TypeKind#

Identifies a data type in an orc file.


enumerator BOOLEAN#
enumerator BYTE#
enumerator SHORT#
enumerator INT#
enumerator LONG#
enumerator FLOAT#
enumerator DOUBLE#
enumerator STRING#
enumerator BINARY#
enumerator TIMESTAMP#
enumerator LIST#
enumerator MAP#
enumerator STRUCT#
enumerator UNION#
enumerator DECIMAL#
enumerator DATE#
enumerator VARCHAR#
enumerator CHAR#
enum StreamKind#

Identifies the type of data stream.


enumerator PRESENT#
enumerator DATA#
enumerator LENGTH#
enumerator SECONDARY#
enumerator ROW_INDEX#
enumerator BLOOM_FILTER#
enumerator BLOOM_FILTER_UTF8#
enum ColumnEncodingKind#

Identifies the encoding of columns.


enumerator DIRECT#
enumerator DICTIONARY#
enumerator DIRECT_V2#
enumerator DICTIONARY_V2#
enum ProtofType#

Identifies the type of encoding in a protocol buffer.


enumerator VARINT#
enumerator FIXED64#
enumerator FIXEDLEN#
enumerator START_GROUP#
enumerator END_GROUP#
enumerator FIXED32#
enumerator INVALID_6#
enumerator INVALID_7#
enum class compression_type : int32_t#

Compression algorithms.


enumerator NONE#

No compression.

enumerator AUTO#

Automatically detect or select compression format.

enumerator SNAPPY#

Snappy format, using byte-oriented LZ77.

enumerator GZIP#

GZIP format, using DEFLATE algorithm.

enumerator BZIP2#

BZIP2 format, using Burrows-Wheeler transform.

enumerator BROTLI#

BROTLI format, using LZ77 + Huffman + 2nd order context modeling.

enumerator ZIP#

ZIP format, using DEFLATE algorithm.

enumerator XZ#

XZ format, using LZMA(2) algorithm.

enumerator ZLIB#

ZLIB format, using DEFLATE algorithm.

enumerator LZ4#

LZ4 format, using LZ77.

enumerator LZO#

Lempel–Ziv–Oberhumer format.

enumerator ZSTD#

Zstandard format.

enum class io_type : int32_t#

Data source or destination types.


enumerator FILEPATH#

Input/output is a file path.

enumerator HOST_BUFFER#

Input/output is a buffer in host memory.

enumerator DEVICE_BUFFER#

Input/output is a buffer in device memory.

enumerator VOID#

Input/output is nothing. No work is done. Useful for benchmarking.


Input/output is handled by a custom user class.

enum class quote_style : int32_t#

Behavior when handling quotations in field data.


enumerator MINIMAL#

Quote only fields which contain special characters.

enumerator ALL#

Quote all fields.

enumerator NONNUMERIC#

Quote all non-numeric fields.

enumerator NONE#

Never quote fields; disable quotation parsing.

enum statistics_freq#

Column statistics granularity type for parquet/orc writers.



No column statistics.


Per-Rowgroup column statistics.


Per-page column statistics.


Full column and offset indices. Implies STATISTICS_ROWGROUP.

enum class column_encoding : int32_t#

Valid encodings for use with column_in_metadata::set_encoding()


enumerator USE_DEFAULT#

No encoding has been requested, use default encoding.

enumerator DICTIONARY#

Use dictionary encoding.

enumerator PLAIN#

Use plain encoding.


Use DELTA_BINARY_PACKED encoding (only valid for integer columns)


Use DELTA_LENGTH_BYTE_ARRAY encoding (only valid for BYTE_ARRAY columns)

enumerator DELTA_BYTE_ARRAY#

Use DELTA_BYTE_ARRAY encoding (only valid for BYTE_ARRAY and FIXED_LEN_BYTE_ARRAY columns)


Use BYTE_STREAM_SPLIT encoding (valid for all fixed width types)

enumerator DIRECT#

Use DIRECT encoding.

enumerator DIRECT_V2#

Use DIRECT_V2 encoding.

enumerator DICTIONARY_V2#

Use DICTIONARY_V2 encoding.

enum dictionary_policy#

Control use of dictionary encoding for parquet writer.


enumerator NEVER#

Never use dictionary encoding.

enumerator ADAPTIVE#

Use dictionary when it will not impact compression.

enumerator ALWAYS#

Use dictionary regardless of impact on compression.


template<typename T>
inline constexpr auto is_byte_like_type()#

Returns true if the type is byte-like, meaning it is reasonable to pass as a pointer to bytes.

Template Parameters:

T – The representation type


true if the type is considered a byte-like type

struct raw_orc_statistics#
#include <orc_metadata.hpp>

Holds column names and buffers containing raw file-level and stripe-level statistics.

The buffers can be parsed using a Protobuf parser. Alternatively, use parsed_orc_statistics to get the statistics parsed into a libcudf representation.

The column_names and file_stats members contain one element per column. The stripes_stats contains one element per stripe, where each element contains column statistics for each column.

Public Members

std::vector<std::string> column_names#

Column names.

std::vector<std::string> file_stats#

File-level statistics for each column.

std::vector<std::vector<std::string>> stripes_stats#

Stripe-level statistics for each column.

template<typename T>
struct minmax_statistics#
#include <orc_metadata.hpp>

Base class for column statistics that include optional minimum and maximum.

Includes accessors for the minimum and maximum values.

Public Members

std::optional<T> minimum#

Minimum value.

std::optional<T> maximum#

Maximum value.

template<typename T>
struct sum_statistics#
#include <orc_metadata.hpp>

Base class for column statistics that include an optional sum.

Includes accessors for the sum value.

Public Members

std::optional<T> sum#

Sum of values in column.

struct integer_statistics : public cudf::io::minmax_statistics<int64_t>, public cudf::io::sum_statistics<int64_t>#
#include <orc_metadata.hpp>

Statistics for integral columns.

struct double_statistics : public cudf::io::minmax_statistics<double>, public cudf::io::sum_statistics<double>#
#include <orc_metadata.hpp>

Statistics for floating point columns.

struct string_statistics : public cudf::io::minmax_statistics<std::string>, public cudf::io::sum_statistics<int64_t>#
#include <orc_metadata.hpp>

Statistics for string columns.

The minimum and maximum are the first and last elements, respectively, in lexicographical order. The sum is the total length of elements in the column. Note: According to ORC specs, the sum should be signed, but pyarrow uses unsigned value

struct bucket_statistics#
#include <orc_metadata.hpp>

Statistics for boolean columns.

The count array contains the count of true values.

Public Members

std::vector<uint64_t> count#

count of true values

struct decimal_statistics : public cudf::io::minmax_statistics<std::string>, public cudf::io::sum_statistics<std::string>#
#include <orc_metadata.hpp>

Statistics for decimal columns.

struct timestamp_statistics : public cudf::io::minmax_statistics<int64_t>#
#include <orc_metadata.hpp>

Statistics for timestamp columns.

The minimum and maximum min/max elements in the column, as the number of milliseconds since the UNIX epoch. The minimum_utc and maximum_utc are the same values adjusted to UTC.

Public Members

std::optional<int64_t> minimum_utc#

minimum in milliseconds

std::optional<int64_t> maximum_utc#

maximum in milliseconds

std::optional<uint32_t> minimum_nanos#

nanoseconds part of the minimum

std::optional<uint32_t> maximum_nanos#

nanoseconds part of the maximum

struct column_statistics#
#include <orc_metadata.hpp>

Contains per-column ORC statistics.

All columns can have the number_of_values statistics. Depending on the data type, a column can have additional statistics, accessible through type_specific_stats accessor.

Public Functions

column_statistics(orc::column_statistics &&detail_statistics)#

Construct a new column statistics object.


detail_statistics – The statistics to initialize the object with

Public Members

std::optional<uint64_t> number_of_values#

number of statistics

std::optional<bool> has_null#

column has any nulls

statistics_type type_specific_stats#

type-specific statistics

struct parsed_orc_statistics#
#include <orc_metadata.hpp>

Holds column names and parsed file-level and stripe-level statistics.

The column_names and file_stats members contain one element per column. The stripes_stats member contains one element per stripe, where each element contains column statistics for each column.

Public Members

std::vector<std::string> column_names#

column names

std::vector<column_statistics> file_stats#

file-level statistics

std::vector<std::vector<column_statistics>> stripes_stats#

stripe-level statistics

struct orc_column_schema#
#include <orc_metadata.hpp>

Schema of an ORC column, including the nested columns.

Public Functions

inline orc_column_schema(std::string_view name, orc::TypeKind type, std::vector<orc_column_schema> children)#


  • name – column name

  • type – ORC type

  • children – child columns (empty for non-nested types)

inline auto name() const#

Returns ORC column name; can be empty.


Column name

inline auto type_kind() const#

Returns ORC type of the column.


Column ORC type

inline auto const &children() const &#

Returns schemas of all child columns.


Children schemas

inline auto children() &&#

Returns schemas of all child columns.


Children schemas Children array is moved out of the object (rvalues only).

inline auto const &child(int idx) const &#

Returns schema of the child with the given index.


idx – child index


Child schema

inline auto child(int idx) &&#

Returns schema of the child with the given index.


idx – child index


Child schema Child is moved out of the object (rvalues only).

inline auto num_children() const#

Returns the number of child columns.


Children count

struct orc_schema#
#include <orc_metadata.hpp>

Schema of an ORC file.

Public Functions

inline orc_schema(orc_column_schema root_column_schema)#



root_column_schema – root column

inline auto const &root() const &#

Returns the schema of the struct column that contains all columns as fields.


Root column schema

inline auto root() &&#

Returns the schema of the struct column that contains all columns as fields.


Root column schema Root column schema is moved out of the object (rvalues only).

class orc_metadata#
#include <orc_metadata.hpp>

Information about content of an ORC file.

Public Functions

inline orc_metadata(orc_schema schema, uint64_t num_rows, size_type num_stripes)#


  • schema – ORC schema

  • num_rows – number of rows

  • num_stripes – number of stripes

inline auto const &schema() const#

Returns the ORC schema.


ORC schema Number of rows in the root column; can vary for nested columns

inline auto num_rows() const#

Returns the number of rows of the root column.

If a file contains list columns, nested columns can have a different number of rows.


Number of rows

inline auto num_stripes() const#

Returns the number of stripes in the file.


Number of stripes

struct parquet_column_schema#
#include <parquet_metadata.hpp>

Schema of a parquet column, including the nested columns.

Public Functions

explicit parquet_column_schema() = default#

Default constructor.

This has been added since Cython requires a default constructor to create objects on stack.

inline parquet_column_schema(std::string_view name, parquet::TypeKind type, std::vector<parquet_column_schema> children)#


  • name – column name

  • type – parquet type

  • children – child columns (empty for non-nested types)

inline auto name() const#

Returns parquet column name; can be empty.


Column name

inline auto type_kind() const#

Returns parquet type of the column.


Column parquet type

inline auto const &children() const &#

Returns schemas of all child columns.


Children schemas

inline auto children() &&#

Returns schemas of all child columns.


Children schemas Children array is moved out of the object (rvalues only).

inline auto const &child(int idx) const &#

Returns schema of the child with the given index.


idx – child index


Child schema

inline auto child(int idx) &&#

Returns schema of the child with the given index.


idx – child index


Child schema Child is moved out of the object (rvalues only).

inline auto num_children() const#

Returns the number of child columns.


Children count

struct parquet_schema#
#include <parquet_metadata.hpp>

Schema of a parquet file.

Public Functions

explicit parquet_schema() = default#

Default constructor.

This has been added since Cython requires a default constructor to create objects on stack.

inline parquet_schema(parquet_column_schema root_column_schema)#



root_column_schema – root column

inline auto const &root() const &#

Returns the schema of the struct column that contains all columns as fields.


Root column schema

inline auto root() &&#

Returns the schema of the struct column that contains all columns as fields.


Root column schema Root column schema is moved out of the object (rvalues only).

class parquet_metadata#
#include <parquet_metadata.hpp>

Information about content of a parquet file.

Public Types

using key_value_metadata = std::unordered_map<std::string, std::string>#

Key-value metadata in the file footer.

using row_group_metadata = std::unordered_map<std::string, int64_t>#

row group metadata from each RowGroup element.

Public Functions

explicit parquet_metadata() = default#

Default constructor.

This has been added since Cython requires a default constructor to create objects on stack.

inline parquet_metadata(parquet_schema schema, int64_t num_rows, size_type num_rowgroups, key_value_metadata file_metadata, std::vector<row_group_metadata> rg_metadata)#


  • schema – parquet schema

  • num_rows – number of rows

  • num_rowgroups – number of row groups

  • file_metadata – key-value metadata in the file footer

  • rg_metadata – vector of maps containing metadata for each row group

inline auto const &schema() const#

Returns the parquet schema.


parquet schema

inline auto num_rows() const#

Returns the number of rows of the root column.

If a file contains list columns, nested columns can have a different number of rows.


Number of rows

inline auto num_rowgroups() const#

Returns the number of rowgroups in the file.


Number of row groups

inline auto const &metadata() const#

Returns the Key value metadata in the file footer.


Key value metadata as a map

inline auto const &rowgroup_metadata() const#

Returns the row group metadata in the file footer.


vector of row group metadata as maps

class writer_compression_statistics#
#include <types.hpp>

Statistics about compression performed by a writer.

Public Functions

writer_compression_statistics() = default#

Default constructor.

inline writer_compression_statistics(size_t num_compressed_bytes, size_t num_failed_bytes, size_t num_skipped_bytes, size_t num_compressed_output_bytes)#

Constructor with initial values.

  • num_compressed_bytes – The number of bytes that were successfully compressed

  • num_failed_bytes – The number of bytes that failed to compress

  • num_skipped_bytes – The number of bytes that were skipped during compression

  • num_compressed_output_bytes – The number of bytes in the compressed output

inline writer_compression_statistics &operator+=(writer_compression_statistics const &other) noexcept#

Adds the values from another writer_compression_statistics object.


other – The other writer_compression_statistics object


writer_compression_statistics& Reference to this object

inline auto num_compressed_bytes() const noexcept#

Returns the number of bytes in blocks that were successfully compressed.

This is the number of bytes that were actually compressed, not the size of the compressed output.


size_t The number of bytes that were successfully compressed

inline auto num_failed_bytes() const noexcept#

Returns the number of bytes in blocks that failed to compress.


size_t The number of bytes that failed to compress

inline auto num_skipped_bytes() const noexcept#

Returns the number of bytes in blocks that were skipped during compression.


size_t The number of bytes that were skipped during compression

inline auto num_total_input_bytes() const noexcept#

Returns the total size of compression inputs.


size_t The total size of compression inputs

inline auto compression_ratio() const noexcept#

Returns the compression ratio for the successfully compressed blocks.

Returns nan if there were no successfully compressed blocks.


double The ratio between the size of the compression inputs and the size of the compressed output.

struct column_name_info#
#include <types.hpp>

Detailed name (and optionally nullability) information for output columns.

The hierarchy of children matches the hierarchy of children in the output cudf columns.

Public Functions

inline column_name_info(std::string _name, std::optional<bool> _is_nullable = std::nullopt, std::optional<bool> _is_binary = std::nullopt)#

Construct a column name info with a name, optional nullabilty, and no children.

  • _name – Column name

  • _is_nullable – True if column is nullable

  • _is_binary – True if column is binary data

inline bool operator==(column_name_info const &rhs) const#

Compares two column name info structs for equality.


rhs – column name info struct to compare against


boolean indicating if this and rhs are equal

Public Members

std::string name#

Column name.

std::optional<bool> is_nullable#

Column nullability.

std::optional<bool> is_binary#

Column is binary (i.e. not a list)

std::optional<int32_t> type_length#

Byte width of data (for fixed length data)

std::vector<column_name_info> children#

Child column names.

struct table_metadata#
#include <types.hpp>

Table metadata returned by IO readers.

Public Members

std::vector<column_name_info> schema_info#

Detailed name information for the entire output hierarchy.

std::vector<size_t> num_rows_per_source#

Number of rows read from each data source. Currently only computed for Parquet readers if no AST filters being used. Empty vector otherwise.

std::map<std::string, std::string> user_data#

Format-dependent metadata of the first input file as key-values pairs (deprecated)

std::vector<std::unordered_map<std::string, std::string>> per_file_user_data#

Per file format-dependent metadata as key-values pairs.

struct table_with_metadata#
#include <types.hpp>

Table with table metadata used by io readers to return the metadata by value.

Public Members

std::unique_ptr<table> tbl#


table_metadata metadata#

Table metadata.

struct host_buffer#
#include <types.hpp>

Non-owning view of a host memory buffer.


Since 23.04

Used to describe buffer input in source_info objects.

Public Functions

inline host_buffer(char const *data, size_t size)#

Construct a new host buffer object.

  • data – Pointer to the buffer

  • size – Size of the buffer

Public Members

char const *data = nullptr#

Pointer to the buffer.

size_t size = 0#

Size of the buffer.

struct source_info#
#include <types.hpp>

Source information for read interfaces.

Public Functions

inline explicit source_info(std::vector<std::string> const &file_paths)#

Construct a new source info object for multiple files.


file_paths – Input files paths

inline explicit source_info(std::string const &file_path)#

Construct a new source info object for a single file.


file_path – Single input file

inline explicit source_info(std::vector<host_buffer> const &host_buffers)#

Construct a new source info object for multiple buffers in host memory.


Since 23.04


host_buffers – Input buffers in host memory

inline explicit source_info(char const *host_data, size_t size)#

Construct a new source info object for a single buffer.


Since 23.04

  • host_data – Input buffer in host memory

  • size – Size of the buffer

template<typename T>
inline explicit source_info(cudf::host_span<cudf::host_span<T>> const host_buffers)#

Construct a new source info object for multiple buffers in host memory.


host_buffers – Input buffers in host memory

template<typename T>
inline explicit source_info(cudf::host_span<T> host_data)#

Construct a new source info object for a single buffer.


host_data – Input buffer in host memory

inline explicit source_info(cudf::host_span<cudf::device_span<std::byte const>> device_buffers)#

Construct a new source info object for multiple buffers in device memory.


device_buffers – Input buffers in device memory

inline explicit source_info(cudf::device_span<std::byte const> d_buffer)#

Construct a new source info object from a device buffer.


d_buffer – Input buffer in device memory

inline explicit source_info(std::vector<cudf::io::datasource*> const &sources)#

Construct a new source info object for multiple user-implemented sources.


sources – User-implemented input sources

inline explicit source_info(cudf::io::datasource *source)#

Construct a new source info object for a single user-implemented source.


source – Single user-implemented Input source

inline auto type() const#

Get the type of the input.


The type of the input

inline auto const &filepaths() const#

Get the filepaths of the input.


The filepaths of the input

inline auto const &host_buffers() const#

Get the host buffers of the input.


The host buffers of the input

inline auto const &device_buffers() const#

Get the device buffers of the input.


The device buffers of the input

inline auto const &user_sources() const#

Get the user sources of the input.


The user sources of the input

struct sink_info#
#include <types.hpp>

Destination information for write interfaces.

Public Functions

inline sink_info(size_t num_sinks)#

Construct a new sink info object.


num_sinks – Number of sinks

inline explicit sink_info(std::vector<std::string> const &file_paths)#

Construct a new sink info object for multiple files.


file_paths – Output files paths

inline explicit sink_info(std::string const &file_path)#

Construct a new sink info object for a single file.


file_path – Single output file path

inline explicit sink_info(std::vector<std::vector<char>*> const &buffers)#

Construct a new sink info object for multiple host buffers.


buffers – Output host buffers

inline explicit sink_info(std::vector<char> *buffer)#

Construct a new sink info object for a single host buffer.


buffer – Single output host buffer

inline explicit sink_info(std::vector<cudf::io::data_sink*> const &user_sinks)#

Construct a new sink info object for multiple user-implemented sinks.


user_sinks – Output user-implemented sinks

inline explicit sink_info(class cudf::io::data_sink *user_sink)#

Construct a new sink info object for a single user-implemented sink.


user_sink – Single output user-implemented sink

inline auto type() const#

Get the type of the input.


The type of the input

inline auto num_sinks() const#

Get the number of sinks.


The number of sinks

inline auto const &filepaths() const#

Get the filepaths of the input.


The filepaths of the input

inline auto const &buffers() const#

Get the host buffers of the input.


The host buffers of the input

inline auto const &user_sinks() const#

Get the user sinks of the input.


The user sinks of the input

class column_in_metadata#
#include <types.hpp>

Metadata for a column.

Public Functions

inline column_in_metadata(std::string_view name)#

Construct a new column in metadata object.


name – Column name

inline column_in_metadata &add_child(column_in_metadata const &child)#

Add the children metadata of this column.


child – The children metadata of this column to add


this for chaining

inline column_in_metadata &set_name(std::string const &name) noexcept#

Set the name of this column.


name – Name of the column


this for chaining

inline column_in_metadata &set_nullability(bool nullable) noexcept#

Set the nullability of this column.


nullable – Whether this column is nullable


this for chaining

inline column_in_metadata &set_list_column_as_map() noexcept#

Specify that this list column should be encoded as a map in the written file.

The column must have the structure list<struct<key, value>>. This option is invalid otherwise


this for chaining

inline column_in_metadata &set_int96_timestamps(bool req) noexcept#

Specifies whether this timestamp column should be encoded using the deprecated int96 physical type. Only valid for the following column types: timestamp_s, timestamp_ms, timestamp_us, timestamp_ns.


req – True = use int96 physical type. False = use int64 physical type


this for chaining

inline column_in_metadata &set_decimal_precision(uint8_t precision) noexcept#

Set the decimal precision of this column. Only valid if this column is a decimal (fixed-point) type.


precision – The integer precision to set for this decimal column


this for chaining

inline column_in_metadata &set_type_length(int32_t length) noexcept#

Set the data length of the column. Only valid if this column is a fixed-length byte array.


length – The data length to set for this column


this for chaining

inline column_in_metadata &set_parquet_field_id(int32_t field_id) noexcept#

Set the parquet field id of this column.


field_id – The parquet field id to set


this for chaining

inline column_in_metadata &set_output_as_binary(bool binary) noexcept#

Specifies whether this column should be written as binary or string data Only valid for the following column types: string.


binary – True = use binary data type. False = use string data type


this for chaining

inline column_in_metadata &set_skip_compression(bool skip) noexcept#

Specifies whether this column should not be compressed regardless of the compression codec specified for the file.


skip – If true do not compress this column


this for chaining

inline column_in_metadata &set_encoding(column_encoding encoding) noexcept#

Sets the encoding to use for this column.

This is just a request, and the encoder may still choose to use a different encoding depending on resource constraints. Use the constants defined in the parquet_encoding struct.


encoding – The encoding to use


this for chaining

inline column_in_metadata &child(size_type i) noexcept#

Get reference to a child of this column.


i – Index of the child to get


this for chaining

inline column_in_metadata const &child(size_type i) const noexcept#

Get const reference to a child of this column.


i – Index of the child to get


this for chaining

inline std::string get_name() const noexcept#

Get the name of this column.


The name of this column

inline bool is_nullability_defined() const noexcept#

Get whether nullability has been explicitly set for this column.


Boolean indicating whether nullability has been explicitly set for this column

inline bool nullable() const#

Gets the explicitly set nullability for this column.


std::bad_optional_access – If nullability is not explicitly defined for this column. Check using is_nullability_defined() first.


Boolean indicating whether this column is nullable

inline bool is_map() const noexcept#

If this is the metadata of a list column, returns whether it is to be encoded as a map.


Boolean indicating whether this column is to be encoded as a map

inline bool is_enabled_int96_timestamps() const noexcept#

Get whether to encode this timestamp column using deprecated int96 physical type.


Boolean indicating whether to encode this timestamp column using deprecated int96 physical type

inline bool is_decimal_precision_set() const noexcept#

Get whether precision has been set for this decimal column.


Boolean indicating whether precision has been set for this decimal column

inline uint8_t get_decimal_precision() const#

Get the decimal precision that was set for this column.


std::bad_optional_access – If decimal precision was not set for this column. Check using is_decimal_precision_set() first.


The decimal precision that was set for this column

inline bool is_type_length_set() const noexcept#

Get whether type length has been set for this column.


Boolean indicating whether type length has been set for this column

inline uint8_t get_type_length() const#

Get the type length that was set for this column.


std::bad_optional_access – If type length was not set for this column. Check using is_type_length_set() first.


The decimal precision that was set for this column

inline bool is_parquet_field_id_set() const noexcept#

Get whether parquet field id has been set for this column.


Boolean indicating whether parquet field id has been set for this column

inline int32_t get_parquet_field_id() const#

Get the parquet field id that was set for this column.


std::bad_optional_access – If parquet field id was not set for this column. Check using is_parquet_field_id_set() first.


The parquet field id that was set for this column

inline size_type num_children() const noexcept#

Get the number of children of this column.


The number of children of this column

inline bool is_enabled_output_as_binary() const noexcept#

Get whether to encode this column as binary or string data.


Boolean indicating whether to encode this column as binary data

inline bool is_enabled_skip_compression() const noexcept#

Get whether to skip compressing this column.


Boolean indicating whether to skip compression of this column

inline column_encoding get_encoding() const#

Get the encoding that was set for this column.


The encoding that was set for this column

class table_input_metadata#
#include <types.hpp>

Metadata for a table.

Public Functions

explicit table_input_metadata(table_view const &table)#

Construct a new table_input_metadata from a table_view.

The constructed table_input_metadata has the same structure as the passed table_view


table – The table_view to construct metadata for

explicit table_input_metadata(table_metadata const &metadata)#

Construct a new table_input_metadata from a table_metadata object.

The constructed table_input_metadata has the same structure, column names and nullability as the passed table_metadata.


metadata – The table_metadata to construct table_intput_metadata for

Public Members

std::vector<column_in_metadata> column_metadata#

List of column metadata.

struct partition_info#
#include <types.hpp>

Information used while writing partitioned datasets.

This information defines the slice of an input table to write to file. In partitioned dataset writing, one partition_info struct defines one partition and corresponds to one output file

Public Functions

inline partition_info(size_type start_row, size_type num_rows)#

Construct a new partition_info.

  • start_row – The start row of the partition

  • num_rows – The number of rows in the partition

Public Members

size_type start_row#

The start row of the partition.

size_type num_rows#

The number of rows in the partition.

class reader_column_schema#
#include <types.hpp>

schema element for reader

Public Functions

inline reader_column_schema(size_type number_of_children)#

Construct a new reader column schema object.


number_of_children – number of child schema objects to default construct

inline reader_column_schema(host_span<reader_column_schema> const &child_span)#

Construct a new reader column schema object with a span defining the children.


child_span – span of child schema objects

inline reader_column_schema &add_child(reader_column_schema const &child)#

Add the children metadata of this column.


child – The children metadata of this column to add


this for chaining

inline reader_column_schema &child(size_type i)#

Get reference to a child of this column.


i – Index of the child to get


this for chaining

inline reader_column_schema const &child(size_type i) const#

Get const reference to a child of this column.


i – Index of the child to get


this for chaining

inline reader_column_schema &set_convert_binary_to_strings(bool convert_to_string)#

Specifies whether this column should be written as binary or string data Only valid for the following column types: string, list<int8>


convert_to_string – True = convert binary to strings False = return binary


this for chaining

inline reader_column_schema &set_type_length(int32_t type_length)#

Sets the length of fixed length data.


type_length – Size of the data type in bytes


this for chaining

inline bool is_enabled_convert_binary_to_strings() const#

Get whether to encode this column as binary or string data.


Boolean indicating whether to encode this column as binary data

inline int32_t get_type_length() const#

Get the length in bytes of this fixed length data.


The length in bytes of the data type

inline size_t get_num_children() const#

Get the number of child objects.


number of children