Io Types#

group io_types

Typedefs

using no_statistics = std::monostate#: Monostate type alias for the statistics variant.

using date_statistics = minmax_statistics<int32_t>#: Statistics for date(time) columns.

using binary_statistics = sum_statistics<int64_t>#

Statistics for binary columns.

The sum is the total number of bytes across all elements.

using statistics_type = std::variant<no_statistics, integer_statistics, double_statistics, string_statistics, bucket_statistics, decimal_statistics, date_statistics, binary_statistics, timestamp_statistics>#

Variant type for ORC type-specific column statistics.

The variant can hold any of the supported column statistics types.

Enums

enum CompressionKind#

Identifies a compression algorithm.

Values:

enumerator NONE#

enumerator ZLIB#

enumerator SNAPPY#

enumerator LZO#

enumerator LZ4#

enumerator ZSTD#

enum TypeKind#

Identifies a data type in an orc file.

Values:

enumerator INVALID_TYPE_KIND#

enumerator BOOLEAN#

enumerator BYTE#

enumerator SHORT#

enumerator INT#

enumerator LONG#

enumerator FLOAT#

enumerator DOUBLE#

enumerator STRING#

enumerator BINARY#

enumerator TIMESTAMP#

enumerator LIST#

enumerator MAP#

enumerator STRUCT#

enumerator UNION#

enumerator DECIMAL#

enumerator DATE#

enumerator VARCHAR#

enumerator CHAR#

enum StreamKind#

Identifies the type of data stream.

Values:

enumerator INVALID_STREAM_KIND#

enumerator PRESENT#

enumerator DATA#

enumerator LENGTH#

enumerator DICTIONARY_DATA#

enumerator DICTIONARY_COUNT#

enumerator SECONDARY#

enumerator ROW_INDEX#

enumerator BLOOM_FILTER#

enumerator BLOOM_FILTER_UTF8#

enum ColumnEncodingKind#

Identifies the encoding of columns.

Values:

enumerator INVALID_ENCODING_KIND#

enumerator DIRECT#

enumerator DICTIONARY#

enumerator DIRECT_V2#

enumerator DICTIONARY_V2#

enum ProtofType#

Identifies the type of encoding in a protocol buffer.

Values:

enumerator VARINT#

enumerator FIXED64#

enumerator FIXEDLEN#

enumerator START_GROUP#

enumerator END_GROUP#

enumerator FIXED32#

enumerator INVALID_6#

enumerator INVALID_7#

enum class compression_type#

Compression algorithms.

Values:

enumerator NONE#: No compression.

enumerator AUTO#: Automatically detect or select compression format.

enumerator SNAPPY#: Snappy format, using byte-oriented LZ77.

enumerator GZIP#: GZIP format, using DEFLATE algorithm.

enumerator BZIP2#: BZIP2 format, using Burrows-Wheeler transform.

enumerator BROTLI#: BROTLI format, using LZ77 + Huffman + 2nd order context modeling.

enumerator ZIP#: ZIP format, using DEFLATE algorithm.

enumerator XZ#: XZ format, using LZMA(2) algorithm.

enumerator ZLIB#: ZLIB format, using DEFLATE algorithm.

enumerator LZ4#: LZ4 format, using LZ77.

enumerator LZO#: Lempel–Ziv–Oberhumer format.

enumerator ZSTD#: Zstandard format.

enum class io_type#

Data source or destination types.

Values:

enumerator FILEPATH#: Input/output is a file path.

enumerator HOST_BUFFER#: Input/output is a buffer in host memory.

enumerator DEVICE_BUFFER#: Input/output is a buffer in device memory.

enumerator VOID#: Input/output is nothing. No work is done. Useful for benchmarking.

enumerator USER_IMPLEMENTED#: Input/output is handled by a custom user class.

enum class quote_style#

Behavior when handling quotations in field data.

Values:

enumerator MINIMAL#: Quote only fields which contain special characters.

enumerator ALL#: Quote all fields.

enumerator NONNUMERIC#: Quote all non-numeric fields.

enumerator NONE#: Never quote fields; disable quotation parsing.

enum statistics_freq#

Column statistics granularity type for parquet/orc writers.

Values:

enumerator STATISTICS_NONE#: No column statistics.

enumerator STATISTICS_ROWGROUP#: Per-Rowgroup column statistics.

enumerator STATISTICS_PAGE#: Per-page column statistics.

enumerator STATISTICS_COLUMN#: Full column and offset indices. Implies STATISTICS_ROWGROUP.

enum class column_encoding#

Valid encodings for use with column_in_metadata::set_encoding()

Values:

enumerator USE_DEFAULT#: No encoding has been requested, use default encoding.

enumerator DICTIONARY#: Use dictionary encoding.

enumerator PLAIN#: Use plain encoding.

enumerator DELTA_BINARY_PACKED#: Use DELTA_BINARY_PACKED encoding (only valid for integer columns)

enumerator DELTA_LENGTH_BYTE_ARRAY#: Use DELTA_LENGTH_BYTE_ARRAY encoding (only valid for BYTE_ARRAY columns)

enumerator DELTA_BYTE_ARRAY#: Use DELTA_BYTE_ARRAY encoding (only valid for BYTE_ARRAY and FIXED_LEN_BYTE_ARRAY columns)

enumerator BYTE_STREAM_SPLIT#: Use BYTE_STREAM_SPLIT encoding (valid for all fixed width types)

enumerator DIRECT#: Use DIRECT encoding.

enumerator DIRECT_V2#: Use DIRECT_V2 encoding.

enumerator DICTIONARY_V2#: Use DICTIONARY_V2 encoding.

enum dictionary_policy#

Control use of dictionary encoding for parquet writer.

Values:

enumerator NEVER#: Never use dictionary encoding.

enumerator ADAPTIVE#: Use dictionary when it will not impact compression.

enumerator ALWAYS#: Use dictionary regardless of impact on compression.

Functions

template<typename T> inline constexpr auto is_byte_like_type()#

Returns true if the type is byte-like, meaning it is reasonable to pass as a pointer to bytes.

Template Parameters:: T – The representation type
Returns:: true if the type is considered a byte-like type

struct raw_orc_statistics#

#include <orc_metadata.hpp>

Holds column names and buffers containing raw file-level and stripe-level statistics.

The buffers can be parsed using a Protobuf parser. Alternatively, use parsed_orc_statistics to get the statistics parsed into a libcudf representation.

The column_names and file_stats members contain one element per column. The stripes_stats contains one element per stripe, where each element contains column statistics for each column.

Public Members

std::vector<std::string> column_names#: Column names.

std::vector<std::string> file_stats#: File-level statistics for each column.

std::vector<std::vector<std::string>> stripes_stats#: Stripe-level statistics for each column.

template<typename T> struct minmax_statistics#

#include <orc_metadata.hpp>

Base class for column statistics that include optional minimum and maximum.

Includes accessors for the minimum and maximum values.

Public Members

std::optional<T> minimum#: Minimum value.

std::optional<T> maximum#: Maximum value.

template<typename T> struct sum_statistics#

#include <orc_metadata.hpp>

Base class for column statistics that include an optional sum.

Includes accessors for the sum value.

Public Members

std::optional<T> sum#: Sum of values in column.

struct integer_statistics : public cudf::io::minmax_statistics<int64_t>, public cudf::io::sum_statistics<int64_t>#: #include <orc_metadata.hpp>

Statistics for integral columns.

struct double_statistics : public cudf::io::minmax_statistics<double>, public cudf::io::sum_statistics<double>#: #include <orc_metadata.hpp>

Statistics for floating point columns.

struct string_statistics : public cudf::io::minmax_statistics<std::string>, public cudf::io::sum_statistics<int64_t>#

#include <orc_metadata.hpp>

Statistics for string columns.

The minimum and maximum are the first and last elements, respectively, in lexicographical order. The sum is the total length of elements in the column. Note: According to ORC specs, the sum should be signed, but pyarrow uses unsigned value

struct bucket_statistics#

#include <orc_metadata.hpp>

Statistics for boolean columns.

The count array contains the count of true values.

Public Members

std::vector<uint64_t> count#: count of true values

struct decimal_statistics : public cudf::io::minmax_statistics<std::string>, public cudf::io::sum_statistics<std::string>#: #include <orc_metadata.hpp>

Statistics for decimal columns.

struct timestamp_statistics : public cudf::io::minmax_statistics<int64_t>#

#include <orc_metadata.hpp>

Statistics for timestamp columns.

The minimum and maximum min/max elements in the column, as the number of milliseconds since the UNIX epoch. The minimum_utc and maximum_utc are the same values adjusted to UTC.

Public Members

std::optional<int64_t> minimum_utc#: minimum in milliseconds

std::optional<int64_t> maximum_utc#: maximum in milliseconds

std::optional<uint32_t> minimum_nanos#: nanoseconds part of the minimum

std::optional<uint32_t> maximum_nanos#: nanoseconds part of the maximum

struct column_statistics#

#include <orc_metadata.hpp>

Contains per-column ORC statistics.

All columns can have the number_of_values statistics. Depending on the data type, a column can have additional statistics, accessible through type_specific_stats accessor.

Public Functions

column_statistics(orc::column_statistics &&detail_statistics)#

Construct a new column statistics object.

Parameters:: detail_statistics – The statistics to initialize the object with

Public Members

std::optional<uint64_t> number_of_values#: number of statistics

std::optional<bool> has_null#: column has any nulls

statistics_type type_specific_stats#: type-specific statistics

struct parsed_orc_statistics#

#include <orc_metadata.hpp>

Holds column names and parsed file-level and stripe-level statistics.

The column_names and file_stats members contain one element per column. The stripes_stats member contains one element per stripe, where each element contains column statistics for each column.

Public Members

std::vector<std::string> column_names#: column names

std::vector<column_statistics> file_stats#: file-level statistics

std::vector<std::vector<column_statistics>> stripes_stats#: stripe-level statistics

struct orc_column_schema#

#include <orc_metadata.hpp>

Schema of an ORC column, including the nested columns.

Public Functions

inline orc_column_schema(std::string_view name, orc::TypeKind type, std::vector<orc_column_schema> children)#

constructor

Parameters:

name – column name
type – ORC type
children – child columns (empty for non-nested types)

inline auto name() const#

Returns ORC column name; can be empty.

Returns:: Column name

inline auto type_kind() const#

Returns ORC type of the column.

Returns:: Column ORC type

inline auto const &children() const &#

Returns schemas of all child columns.

Returns:: Children schemas

inline auto children() &&#

Returns schemas of all child columns.

Returns:: Children schemas Children array is moved out of the object (rvalues only).

inline auto const &child(int idx) const &#

Returns schema of the child with the given index.

Parameters:: idx – child index
Returns:: Child schema

inline auto child(int idx) &&#

Returns schema of the child with the given index.

Parameters:: idx – child index
Returns:: Child schema Child is moved out of the object (rvalues only).

inline auto num_children() const#

Returns the number of child columns.

Returns:: Children count

struct orc_schema#

#include <orc_metadata.hpp>

Schema of an ORC file.

Public Functions

inline orc_schema(orc_column_schema root_column_schema)#

constructor

Parameters:: root_column_schema – root column

inline auto const &root() const &#

Returns the schema of the struct column that contains all columns as fields.

Returns:: Root column schema

inline auto root() &&#

Returns the schema of the struct column that contains all columns as fields.

Returns:: Root column schema Root column schema is moved out of the object (rvalues only).

class orc_metadata#

#include <orc_metadata.hpp>

Information about content of an ORC file.

Public Functions

inline orc_metadata(orc_schema schema, uint64_t num_rows, size_type num_stripes)#

constructor

Parameters:

schema – ORC schema
num_rows – number of rows
num_stripes – number of stripes

inline auto const &schema() const#

Returns the ORC schema.

Returns:: ORC schema Number of rows in the root column; can vary for nested columns

inline auto num_rows() const#

Returns the number of rows of the root column.

If a file contains list columns, nested columns can have a different number of rows.

Returns:: Number of rows

inline auto num_stripes() const#

Returns the number of stripes in the file.

Returns:: Number of stripes

struct parquet_column_schema#

#include <parquet_metadata.hpp>

Schema of a parquet column, including the nested columns.

Public Functions

explicit parquet_column_schema() = default#

Default constructor.

This has been added since Cython requires a default constructor to create objects on stack.

inline parquet_column_schema(std::string_view name, parquet::TypeKind type, std::vector<parquet_column_schema> children)#

constructor

Parameters:

name – column name
type – parquet type
children – child columns (empty for non-nested types)

inline auto name() const#

Returns parquet column name; can be empty.

Returns:: Column name

inline auto type_kind() const#

Returns parquet type of the column.

Returns:: Column parquet type

inline auto const &children() const &#

Returns schemas of all child columns.

Returns:: Children schemas

inline auto children() &&#

Returns schemas of all child columns.

Returns:: Children schemas Children array is moved out of the object (rvalues only).

inline auto const &child(int idx) const &#

Returns schema of the child with the given index.

Parameters:: idx – child index
Returns:: Child schema

inline auto child(int idx) &&#

Returns schema of the child with the given index.

Parameters:: idx – child index
Returns:: Child schema Child is moved out of the object (rvalues only).

inline auto num_children() const#

Returns the number of child columns.

Returns:: Children count

struct parquet_schema#

#include <parquet_metadata.hpp>

Schema of a parquet file.

Public Functions

explicit parquet_schema() = default#

Default constructor.

This has been added since Cython requires a default constructor to create objects on stack.

inline parquet_schema(parquet_column_schema root_column_schema)#

constructor

Parameters:: root_column_schema – root column

inline auto const &root() const &#

Returns the schema of the struct column that contains all columns as fields.

Returns:: Root column schema

inline auto root() &&#

Returns the schema of the struct column that contains all columns as fields.

Returns:: Root column schema Root column schema is moved out of the object (rvalues only).

class parquet_metadata#

#include <parquet_metadata.hpp>

Information about content of a parquet file.

Public Types

using key_value_metadata = std::unordered_map<std::string, std::string>#: Key-value metadata in the file footer.

using row_group_metadata = std::unordered_map<std::string, int64_t>#: row group metadata from each RowGroup element.

Public Functions

explicit parquet_metadata() = default#

Default constructor.

This has been added since Cython requires a default constructor to create objects on stack.

inline parquet_metadata(parquet_schema schema, int64_t num_rows, size_type num_rowgroups, key_value_metadata file_metadata, std::vector<row_group_metadata> rg_metadata)#

constructor

Parameters:

schema – parquet schema
num_rows – number of rows
num_rowgroups – number of row groups
file_metadata – key-value metadata in the file footer
rg_metadata – vector of maps containing metadata for each row group

inline auto const &schema() const#

Returns the parquet schema.

Returns:: parquet schema

inline auto num_rows() const#

Returns the number of rows of the root column.

If a file contains list columns, nested columns can have a different number of rows.

Returns:: Number of rows

inline auto num_rowgroups() const#

Returns the number of rowgroups in the file.

Returns:: Number of row groups

inline auto const &metadata() const#

Returns the Key value metadata in the file footer.

Returns:: Key value metadata as a map

inline auto const &rowgroup_metadata() const#

Returns the row group metadata in the file footer.

Returns:: vector of row group metadata as maps

class writer_compression_statistics#

#include <types.hpp>

Statistics about compression performed by a writer.

Public Functions

writer_compression_statistics() = default#: Default constructor.

inline writer_compression_statistics(size_t num_compressed_bytes, size_t num_failed_bytes, size_t num_skipped_bytes, size_t num_compressed_output_bytes)#

Constructor with initial values.

Parameters:

num_compressed_bytes – The number of bytes that were successfully compressed
num_failed_bytes – The number of bytes that failed to compress
num_skipped_bytes – The number of bytes that were skipped during compression
num_compressed_output_bytes – The number of bytes in the compressed output

inline writer_compression_statistics &operator+=(writer_compression_statistics const &other) noexcept#

Adds the values from another writer_compression_statistics object.

Parameters:: other – The other writer_compression_statistics object
Returns:: writer_compression_statistics& Reference to this object

inline auto num_compressed_bytes() const noexcept#

Returns the number of bytes in blocks that were successfully compressed.

This is the number of bytes that were actually compressed, not the size of the compressed output.

Returns:: size_t The number of bytes that were successfully compressed

inline auto num_failed_bytes() const noexcept#

Returns the number of bytes in blocks that failed to compress.

Returns:: size_t The number of bytes that failed to compress

inline auto num_skipped_bytes() const noexcept#

Returns the number of bytes in blocks that were skipped during compression.

Returns:: size_t The number of bytes that were skipped during compression

inline auto num_total_input_bytes() const noexcept#

Returns the total size of compression inputs.

Returns:: size_t The total size of compression inputs

inline auto compression_ratio() const noexcept#

Returns the compression ratio for the successfully compressed blocks.

Returns nan if there were no successfully compressed blocks.

Returns:: double The ratio between the size of the compression inputs and the size of the compressed output.

struct column_name_info#

#include <types.hpp>

Detailed name (and optionally nullability) information for output columns.

The hierarchy of children matches the hierarchy of children in the output cudf columns.

Public Functions

inline column_name_info(std::string const &_name, std::optional<bool> _is_nullable = std::nullopt, std::optional<bool> _is_binary = std::nullopt)#

Construct a column name info with a name, optional nullabilty, and no children.

Parameters:

_name – Column name
_is_nullable – True if column is nullable
_is_binary – True if column is binary data

Public Members

std::string name#: Column name.

std::optional<bool> is_nullable#: Column nullability.

std::optional<bool> is_binary#: Column is binary (i.e. not a list)

std::optional<int32_t> type_length#: Byte width of data (for fixed length data)

std::vector<column_name_info> children#: Child column names.

struct table_metadata#

#include <types.hpp>

Table metadata returned by IO readers.

Public Members

std::vector<column_name_info> schema_info#: Detailed name information for the entire output hierarchy.

std::map<std::string, std::string> user_data#: Format-dependent metadata of the first input file as key-values pairs (deprecated)

std::vector<std::unordered_map<std::string, std::string>> per_file_user_data#: Per file format-dependent metadata as key-values pairs.

struct table_with_metadata#

#include <types.hpp>

Table with table metadata used by io readers to return the metadata by value.

Public Members

std::unique_ptr<table> tbl#: Table.

table_metadata metadata#: Table metadata.

struct host_buffer#

#include <types.hpp>

Non-owning view of a host memory buffer.

Deprecated:: Since 23.04

Used to describe buffer input in source_info objects.

Public Functions

inline host_buffer(char const *data, size_t size)#

Construct a new host buffer object.

Parameters:

data – Pointer to the buffer
size – Size of the buffer

Public Members

char const *data = nullptr#: Pointer to the buffer.

size_t size = 0#: Size of the buffer.

struct source_info#

#include <types.hpp>

Source information for read interfaces.

Public Functions

inline explicit source_info(std::vector<std::string> const &file_paths)#

Construct a new source info object for multiple files.

Parameters:: file_paths – Input files paths

inline explicit source_info(std::string const &file_path)#

Construct a new source info object for a single file.

Parameters:: file_path – Single input file

inline explicit source_info(std::vector<host_buffer> const &host_buffers)#

Construct a new source info object for multiple buffers in host memory.

Deprecated:: Since 23.04

Parameters:: host_buffers – Input buffers in host memory

inline explicit source_info(char const *host_data, size_t size)#

Construct a new source info object for a single buffer.

Deprecated:: Since 23.04

Parameters:

host_data – Input buffer in host memory
size – Size of the buffer

template<typename T> inline explicit source_info(cudf::host_span<cudf::host_span<T>> const host_buffers)#

Construct a new source info object for multiple buffers in host memory.

Parameters:: host_buffers – Input buffers in host memory

template<typename T> inline explicit source_info(cudf::host_span<T> host_data)#

Construct a new source info object for a single buffer.

Parameters:: host_data – Input buffer in host memory

inline explicit source_info(cudf::host_span<cudf::device_span<std::byte const>> device_buffers)#

Construct a new source info object for multiple buffers in device memory.

Parameters:: device_buffers – Input buffers in device memory

inline explicit source_info(cudf::device_span<std::byte const> d_buffer)#

Construct a new source info object from a device buffer.

Parameters:: d_buffer – Input buffer in device memory

inline explicit source_info(std::vector<cudf::io::datasource*> const &sources)#

Construct a new source info object for multiple user-implemented sources.

Parameters:: sources – User-implemented input sources

inline explicit source_info(cudf::io::datasource *source)#

Construct a new source info object for a single user-implemented source.

Parameters:: source – Single user-implemented Input source

inline auto type() const#

Get the type of the input.

Returns:: The type of the input

inline auto const &filepaths() const#

Get the filepaths of the input.

Returns:: The filepaths of the input

inline auto const &host_buffers() const#

Get the host buffers of the input.

Returns:: The host buffers of the input

inline auto const &device_buffers() const#

Get the device buffers of the input.

Returns:: The device buffers of the input

inline auto const &user_sources() const#

Get the user sources of the input.

Returns:: The user sources of the input

struct sink_info#

#include <types.hpp>

Destination information for write interfaces.

Public Functions

inline sink_info(size_t num_sinks)#

Construct a new sink info object.

Parameters:: num_sinks – Number of sinks

inline explicit sink_info(std::vector<std::string> const &file_paths)#

Construct a new sink info object for multiple files.

Parameters:: file_paths – Output files paths

inline explicit sink_info(std::string const &file_path)#

Construct a new sink info object for a single file.

Parameters:: file_path – Single output file path

inline explicit sink_info(std::vector<std::vector<char>*> const &buffers)#

Construct a new sink info object for multiple host buffers.

Parameters:: buffers – Output host buffers

inline explicit sink_info(std::vector<char> *buffer)#

Construct a new sink info object for a single host buffer.

Parameters:: buffer – Single output host buffer

inline explicit sink_info(std::vector<cudf::io::data_sink*> const &user_sinks)#

Construct a new sink info object for multiple user-implemented sinks.

Parameters:: user_sinks – Output user-implemented sinks

inline explicit sink_info(class cudf::io::data_sink *user_sink)#

Construct a new sink info object for a single user-implemented sink.

Parameters:: user_sink – Single output user-implemented sink

inline auto type() const#

Get the type of the input.

Returns:: The type of the input

inline auto num_sinks() const#

Get the number of sinks.

Returns:: The number of sinks

inline auto const &filepaths() const#

Get the filepaths of the input.

Returns:: The filepaths of the input

inline auto const &buffers() const#

Get the host buffers of the input.

Returns:: The host buffers of the input

inline auto const &user_sinks() const#

Get the user sinks of the input.

Returns:: The user sinks of the input

class column_in_metadata#

#include <types.hpp>

Metadata for a column.

Public Functions

inline column_in_metadata(std::string_view name)#

Construct a new column in metadata object.

Parameters:: name – Column name

inline column_in_metadata &add_child(column_in_metadata const &child)#

Add the children metadata of this column.

Parameters:: child – The children metadata of this column to add
Returns:: this for chaining

inline column_in_metadata &set_name(std::string const &name) noexcept#

Set the name of this column.

Parameters:: name – Name of the column
Returns:: this for chaining

inline column_in_metadata &set_nullability(bool nullable) noexcept#

Set the nullability of this column.

Parameters:: nullable – Whether this column is nullable
Returns:: this for chaining

inline column_in_metadata &set_list_column_as_map() noexcept#

Specify that this list column should be encoded as a map in the written file.

The column must have the structure list<struct<key, value>>. This option is invalid otherwise

Returns:: this for chaining

inline column_in_metadata &set_int96_timestamps(bool req) noexcept#

Specifies whether this timestamp column should be encoded using the deprecated int96 physical type. Only valid for the following column types: timestamp_s, timestamp_ms, timestamp_us, timestamp_ns.

Parameters:: req – True = use int96 physical type. False = use int64 physical type
Returns:: this for chaining

inline column_in_metadata &set_decimal_precision(uint8_t precision) noexcept#

Set the decimal precision of this column. Only valid if this column is a decimal (fixed-point) type.

Parameters:: precision – The integer precision to set for this decimal column
Returns:: this for chaining

inline column_in_metadata &set_type_length(int32_t length) noexcept#

Set the data length of the column. Only valid if this column is a fixed-length byte array.

Parameters:: length – The data length to set for this column
Returns:: this for chaining

inline column_in_metadata &set_parquet_field_id(int32_t field_id) noexcept#

Set the parquet field id of this column.

Parameters:: field_id – The parquet field id to set
Returns:: this for chaining

inline column_in_metadata &set_output_as_binary(bool binary) noexcept#

Specifies whether this column should be written as binary or string data Only valid for the following column types: string.

Parameters:: binary – True = use binary data type. False = use string data type
Returns:: this for chaining

inline column_in_metadata &set_skip_compression(bool skip) noexcept#

Specifies whether this column should not be compressed regardless of the compression codec specified for the file.

Parameters:: skip – If true do not compress this column
Returns:: this for chaining

inline column_in_metadata &set_encoding(column_encoding encoding) noexcept#

Sets the encoding to use for this column.

This is just a request, and the encoder may still choose to use a different encoding depending on resource constraints. Use the constants defined in the parquet_encoding struct.

Parameters:: encoding – The encoding to use
Returns:: this for chaining

inline column_in_metadata &child(size_type i) noexcept#

Get reference to a child of this column.

Parameters:: i – Index of the child to get
Returns:: this for chaining

inline column_in_metadata const &child(size_type i) const noexcept#

Get const reference to a child of this column.

Parameters:: i – Index of the child to get
Returns:: this for chaining

inline std::string get_name() const noexcept#

Get the name of this column.

Returns:: The name of this column

inline bool is_nullability_defined() const noexcept#

Get whether nullability has been explicitly set for this column.

Returns:: Boolean indicating whether nullability has been explicitly set for this column

inline bool nullable() const#

Gets the explicitly set nullability for this column.

Throws:: std::bad_optional_access – If nullability is not explicitly defined for this column. Check using is_nullability_defined() first.
Returns:: Boolean indicating whether this column is nullable

inline bool is_map() const noexcept#

If this is the metadata of a list column, returns whether it is to be encoded as a map.

Returns:: Boolean indicating whether this column is to be encoded as a map

inline bool is_enabled_int96_timestamps() const noexcept#

Get whether to encode this timestamp column using deprecated int96 physical type.

Returns:: Boolean indicating whether to encode this timestamp column using deprecated int96 physical type

inline bool is_decimal_precision_set() const noexcept#

Get whether precision has been set for this decimal column.

Returns:: Boolean indicating whether precision has been set for this decimal column

inline uint8_t get_decimal_precision() const#

Get the decimal precision that was set for this column.

Throws:: std::bad_optional_access – If decimal precision was not set for this column. Check using is_decimal_precision_set() first.
Returns:: The decimal precision that was set for this column

inline bool is_type_length_set() const noexcept#

Get whether type length has been set for this column.

Returns:: Boolean indicating whether type length has been set for this column

inline uint8_t get_type_length() const#

Get the type length that was set for this column.

Throws:: std::bad_optional_access – If type length was not set for this column. Check using is_type_length_set() first.
Returns:: The decimal precision that was set for this column

inline bool is_parquet_field_id_set() const noexcept#

Get whether parquet field id has been set for this column.

Returns:: Boolean indicating whether parquet field id has been set for this column

inline int32_t get_parquet_field_id() const#

Get the parquet field id that was set for this column.

Throws:: std::bad_optional_access – If parquet field id was not set for this column. Check using is_parquet_field_id_set() first.
Returns:: The parquet field id that was set for this column

inline size_type num_children() const noexcept#

Get the number of children of this column.

Returns:: The number of children of this column

inline bool is_enabled_output_as_binary() const noexcept#

Get whether to encode this column as binary or string data.

Returns:: Boolean indicating whether to encode this column as binary data

inline bool is_enabled_skip_compression() const noexcept#

Get whether to skip compressing this column.

Returns:: Boolean indicating whether to skip compression of this column

inline column_encoding get_encoding() const#

Get the encoding that was set for this column.

Returns:: The encoding that was set for this column

class table_input_metadata#

#include <types.hpp>

Metadata for a table.

Public Functions

explicit table_input_metadata(table_view const &table)#

Construct a new table_input_metadata from a table_view.

The constructed table_input_metadata has the same structure as the passed table_view

Parameters:: table – The table_view to construct metadata for

explicit table_input_metadata(table_metadata const &metadata)#

Construct a new table_input_metadata from a table_metadata object.

The constructed table_input_metadata has the same structure, column names and nullability as the passed table_metadata.

Parameters:: metadata – The table_metadata to construct table_intput_metadata for

Public Members

std::vector<column_in_metadata> column_metadata#: List of column metadata.

struct partition_info#

#include <types.hpp>

Information used while writing partitioned datasets.

This information defines the slice of an input table to write to file. In partitioned dataset writing, one partition_info struct defines one partition and corresponds to one output file

Public Functions

inline partition_info(size_type start_row, size_type num_rows)#

Construct a new partition_info.

Parameters:

start_row – The start row of the partition
num_rows – The number of rows in the partition

Public Members

size_type start_row#: The start row of the partition.

size_type num_rows#: The number of rows in the partition.

class reader_column_schema#

#include <types.hpp>

schema element for reader

Public Functions

inline reader_column_schema(size_type number_of_children)#

Construct a new reader column schema object.

Parameters:: number_of_children – number of child schema objects to default construct

inline reader_column_schema(host_span<reader_column_schema> const &child_span)#

Construct a new reader column schema object with a span defining the children.

Parameters:: child_span – span of child schema objects

inline reader_column_schema &add_child(reader_column_schema const &child)#

Add the children metadata of this column.

Parameters:: child – The children metadata of this column to add
Returns:: this for chaining

inline reader_column_schema &child(size_type i)#

Get reference to a child of this column.

Parameters:: i – Index of the child to get
Returns:: this for chaining

inline reader_column_schema const &child(size_type i) const#

Get const reference to a child of this column.

Parameters:: i – Index of the child to get
Returns:: this for chaining

inline reader_column_schema &set_convert_binary_to_strings(bool convert_to_string)#

Specifies whether this column should be written as binary or string data Only valid for the following column types: string, list<int8>

Parameters:: convert_to_string – True = convert binary to strings False = return binary
Returns:: this for chaining

inline reader_column_schema &set_type_length(int32_t type_length)#

Sets the length of fixed length data.

Parameters:: type_length – Size of the data type in bytes
Returns:: this for chaining

inline bool is_enabled_convert_binary_to_strings() const#

Get whether to encode this column as binary or string data.

Returns:: Boolean indicating whether to encode this column as binary data

inline int32_t get_type_length() const#

Get the length in bytes of this fixed length data.

Returns:: The length in bytes of the data type

inline size_t get_num_children() const#

Get the number of child objects.

Returns:: number of children