Builds parquet_reader_options to use for read_parquet(). More...

#include <parquet.hpp>

Public Member Functions
	parquet_reader_options_builder ()=default
	Default constructor. More...

	parquet_reader_options_builder (source_info src)
	Constructor from source info. More...

parquet_reader_options_builder &	columns (std::vector< std::string > column_names)
	Sets names of the columns to be read. More...

parquet_reader_options_builder &	column_names (std::vector< std::string > column_names)
	Sets names of the columns to be read. More...

parquet_reader_options_builder &	column_indices (std::vector< cudf::size_type > col_indices)
	Sets the indices of top-level columns to be read from all input sources. More...

parquet_reader_options_builder &	row_groups (std::vector< std::vector< size_type >> row_groups)
	Sets vector of individual row groups to read. More...

parquet_reader_options_builder &	filter (ast::expression const &filter)
	Sets AST based filter for predicate pushdown. More...

parquet_reader_options_builder &	convert_strings_to_categories (bool val)
	Sets enable/disable conversion of strings to categories. More...

parquet_reader_options_builder &	use_pandas_metadata (bool val)
	Sets to enable/disable use of pandas metadata to read. More...

parquet_reader_options_builder &	use_arrow_schema (bool val)
	Sets to enable/disable use of arrow schema to read. More...

parquet_reader_options_builder &	allow_mismatched_pq_schemas (bool val)
	Sets to enable/disable reading of matching projected and filter columns from mismatched Parquet sources. More...

parquet_reader_options_builder &	ignore_missing_columns (bool val)
	Sets to enable/disable ignoring of non-existent projected columns while reading. More...

parquet_reader_options_builder &	set_column_schema (std::vector< reader_column_schema > val)
	Sets reader metadata. More...

parquet_reader_options_builder &	skip_rows (int64_t val)
	Sets number of rows to skip. More...

parquet_reader_options_builder &	num_rows (int64_t val)
	Sets number of rows to read. More...

parquet_reader_options_builder &	skip_bytes (size_t val)
	Sets bytes to skip before starting reading row groups. More...

parquet_reader_options_builder &	num_bytes (size_t val)
	Sets number of bytes after skipping to end reading row groups at. More...

parquet_reader_options_builder &	timestamp_type (data_type type)
	timestamp_type used to cast timestamp columns. More...

parquet_reader_options_builder &	decimal_width (type_id width)
	Sets the decimal width used to cast decimal columns. More...

parquet_reader_options_builder &	use_jit_filter (bool use_jit_filter)
	Enable/disable use of JIT for filter step. More...

	operator parquet_reader_options && ()
	move parquet_reader_options member once it's built.

parquet_reader_options &&	build ()
	move parquet_reader_options member once it's built. More...

Detailed Description

Builds parquet_reader_options to use for read_parquet().

Definition at line 518 of file parquet.hpp.

Constructor & Destructor Documentation

◆ parquet_reader_options_builder() [1/2]

cudf::io::parquet_reader_options_builder::parquet_reader_options_builder ( )

default

Default constructor.

This has been added since Cython requires a default constructor to create objects on stack. The hybrid_scan_reader also uses this to construct parquet_reader_options without a source.

◆ parquet_reader_options_builder() [2/2]

cudf::io::parquet_reader_options_builder::parquet_reader_options_builder ( source_info src )

inlineexplicit

Constructor from source info.

Parameters

src	The source information used to read parquet file

Definition at line 535 of file parquet.hpp.

Member Function Documentation

◆ allow_mismatched_pq_schemas()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::allow_mismatched_pq_schemas ( bool val )

inline

Sets to enable/disable reading of matching projected and filter columns from mismatched Parquet sources.

Parameters

val	Boolean value whether to read matching projected and filter columns from mismatched Parquet sources.

Returns: this for chaining.

Definition at line 642 of file parquet.hpp.

◆ build()

parquet_reader_options&& cudf::io::parquet_reader_options_builder::build ( )

inline

move parquet_reader_options member once it's built.

This has been added since Cython does not support overloading of conversion operators.

Returns: Built parquet_reader_options object's r-value reference

Definition at line 773 of file parquet.hpp.

◆ column_indices()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::column_indices ( std::vector< cudf::size_type > col_indices )

inline

Sets the indices of top-level columns to be read from all input sources.

Parameters

col_indices A vector of column indices to attempt to read from each input source.

Returns: this for chaining

Definition at line 569 of file parquet.hpp.

◆ column_names()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::column_names ( std::vector< std::string > column_names )

inline

Sets names of the columns to be read.

Parameters

column_names Vector of column names

Returns: this for chaining

Definition at line 557 of file parquet.hpp.

◆ columns()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::columns ( std::vector< std::string > column_names )

inline

Sets names of the columns to be read.

Deprecated:: Deprecated in 26.04 and will be removed in 26.06+. Use column_names instead.

Parameters

column_names Vector of column names

Returns: this for chaining

Definition at line 545 of file parquet.hpp.

◆ convert_strings_to_categories()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::convert_strings_to_categories ( bool val )

inline

Sets enable/disable conversion of strings to categories.

Parameters

val	Boolean value to enable/disable conversion of string columns to categories

Returns: this for chaining

Definition at line 603 of file parquet.hpp.

◆ decimal_width()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::decimal_width ( type_id width )

inline

Sets the decimal width used to cast decimal columns.

Parameters

width The decimal type_id (DECIMAL32, DECIMAL64, or DECIMAL128) to which all decimal columns need to be cast. The scale of each column is preserved from the file.

Returns: this for chaining

Definition at line 743 of file parquet.hpp.

◆ filter()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::filter ( ast::expression const & filter )

inline

Sets AST based filter for predicate pushdown.

The filter can utilize cudf::ast::column_name_reference to reference a column by its name, even if it's not necessarily present in the requested projected columns. To refer to output column indices, you can use cudf::ast::column_reference.

For a parquet with columns ["A", "B", "C", ... "X", "Y", "Z"], Example 1: with/without column projection

use_columns({"A", "X", "Z"})

.filter(operation(ast_operator::LESS, column_name_reference{"C"}, literal{100}));

cudf::io::parquet_reader_options_builder::filter

parquet_reader_options_builder & filter(ast::expression const &filter)

Sets AST based filter for predicate pushdown.

Definition: parquet.hpp:591

Column "C" need not be present in output table. Example 2: without column projection

filter(operation(ast_operator::LESS, column_reference{1}, literal{100}));

Here, 1 will refer to column "B" because output will contain all columns in order ["A", ..., "Z"]. Example 3: with column projection

use_columns({"A", "Z", "X"})

.filter(operation(ast_operator::LESS, column_reference{1}, literal{100}));

Here, 1 will refer to column "Z" because output will contain 3 columns in order ["A", "Z", "X"].

Parameters

filter AST expression to use as filter

Returns: this for chaining

Definition at line 591 of file parquet.hpp.

◆ ignore_missing_columns()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::ignore_missing_columns ( bool val )

inline

Sets to enable/disable ignoring of non-existent projected columns while reading.

Parameters

val	Boolean indicating whether to ignore non-existent projected columns while reading.

Returns: this for chaining.

Definition at line 655 of file parquet.hpp.

◆ num_bytes()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::num_bytes ( size_t val )

inline

Sets number of bytes after skipping to end reading row groups at.

Parameters

val	Number of bytes after skipping to end reading row groups at

Returns: this for chaining

Definition at line 718 of file parquet.hpp.

◆ num_rows()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::num_rows ( int64_t val )

inline

Sets number of rows to read.

Note: Although this allows one to request more than size_type::max() rows, if any single read would produce a table larger than this row limit, an error is thrown.

Parameters

val	Number of rows to read after skip

Returns: this for chaining

Definition at line 694 of file parquet.hpp.

◆ row_groups()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::row_groups ( std::vector< std::vector< size_type >> row_groups )

inline

Sets vector of individual row groups to read.

Parameters

row_groups Vector of row groups to read

Returns: this for chaining

Definition at line 581 of file parquet.hpp.

◆ set_column_schema()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::set_column_schema ( std::vector< reader_column_schema > val )

inline

Sets reader metadata.

Parameters

val	Tree of metadata information.

Returns: this for chaining

Definition at line 667 of file parquet.hpp.

◆ skip_bytes()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::skip_bytes ( size_t val )

inline

Sets bytes to skip before starting reading row groups.

Parameters

val	Bytes to skip before starting reading row groups

Returns: this for chaining

Definition at line 706 of file parquet.hpp.

◆ skip_rows()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::skip_rows ( int64_t val )

inline

Sets number of rows to skip.

Parameters

val	Number of rows to skip from start

Returns: this for chaining

Definition at line 679 of file parquet.hpp.

◆ timestamp_type()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::timestamp_type ( data_type type )

inline

timestamp_type used to cast timestamp columns.

Parameters

type	The timestamp data_type to which all timestamp columns need to be cast

Returns: this for chaining

Definition at line 730 of file parquet.hpp.

◆ use_arrow_schema()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::use_arrow_schema ( bool val )

inline

Sets to enable/disable use of arrow schema to read.

Parameters

val	Boolean value whether to use arrow schema

Returns: this for chaining

Definition at line 627 of file parquet.hpp.

◆ use_jit_filter()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::use_jit_filter ( bool use_jit_filter )

inline

Enable/disable use of JIT for filter step.

Parameters

use_jit_filter Boolean value whether to use JIT filter

Returns: this for chaining

Definition at line 755 of file parquet.hpp.

◆ use_pandas_metadata()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::use_pandas_metadata ( bool val )

inline

Sets to enable/disable use of pandas metadata to read.

Parameters

val	Boolean value whether to use pandas metadata

Returns: this for chaining

Definition at line 615 of file parquet.hpp.

The documentation for this class was generated from the following file:

parquet.hpp

Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ parquet_reader_options_builder() [1/2]

◆ parquet_reader_options_builder() [2/2]

Member Function Documentation

◆ allow_mismatched_pq_schemas()

◆ build()

◆ column_indices()

◆ column_names()

◆ columns()

◆ convert_strings_to_categories()

◆ decimal_width()

◆ filter()

◆ ignore_missing_columns()

◆ num_bytes()

◆ num_rows()

◆ row_groups()

◆ set_column_schema()

◆ skip_bytes()

◆ skip_rows()

◆ timestamp_type()

◆ use_arrow_schema()

◆ use_jit_filter()

◆ use_pandas_metadata()