Builds parquet_reader_options to use for read_parquet().
More...
#include <parquet.hpp>
Public Member Functions | |
| parquet_reader_options_builder ()=default | |
| Default constructor. More... | |
| parquet_reader_options_builder (source_info src) | |
| Constructor from source info. More... | |
| parquet_reader_options_builder & | columns (std::vector< std::string > col_names) |
| Sets names of the columns to be read. More... | |
| parquet_reader_options_builder & | row_groups (std::vector< std::vector< size_type >> row_groups) |
| Sets vector of individual row groups to read. More... | |
| parquet_reader_options_builder & | filter (ast::expression const &filter) |
| Sets AST based filter for predicate pushdown. More... | |
| parquet_reader_options_builder & | convert_strings_to_categories (bool val) |
| Sets enable/disable conversion of strings to categories. More... | |
| parquet_reader_options_builder & | use_pandas_metadata (bool val) |
| Sets to enable/disable use of pandas metadata to read. More... | |
| parquet_reader_options_builder & | use_arrow_schema (bool val) |
| Sets to enable/disable use of arrow schema to read. More... | |
| parquet_reader_options_builder & | allow_mismatched_pq_schemas (bool val) |
| Sets to enable/disable reading of matching projected and filter columns from mismatched Parquet sources. More... | |
| parquet_reader_options_builder & | set_column_schema (std::vector< reader_column_schema > val) |
| Sets reader metadata. More... | |
| parquet_reader_options_builder & | skip_rows (int64_t val) |
| Sets number of rows to skip. More... | |
| parquet_reader_options_builder & | num_rows (int64_t val) |
| Sets number of rows to read. More... | |
| parquet_reader_options_builder & | skip_bytes (size_t val) |
| Sets bytes to skip before starting reading row groups. More... | |
| parquet_reader_options_builder & | num_bytes (size_t val) |
| Sets number of bytes after skipping to end reading row groups at. More... | |
| parquet_reader_options_builder & | timestamp_type (data_type type) |
| timestamp_type used to cast timestamp columns. More... | |
| parquet_reader_options_builder & | use_jit_filter (bool use_jit_filter) |
| Enable/disable use of JIT for filter step. More... | |
| operator parquet_reader_options && () | |
| move parquet_reader_options member once it's built. | |
| parquet_reader_options && | build () |
| move parquet_reader_options member once it's built. More... | |
Builds parquet_reader_options to use for read_parquet().
Definition at line 411 of file parquet.hpp.
|
default |
Default constructor.
This has been added since Cython requires a default constructor to create objects on stack. The hybrid_scan_reader also uses this to construct parquet_reader_options without a source.
|
inlineexplicit |
Constructor from source info.
| src | The source information used to read parquet file |
Definition at line 428 of file parquet.hpp.
|
inline |
Sets to enable/disable reading of matching projected and filter columns from mismatched Parquet sources.
| val | Boolean value whether to read matching projected and filter columns from mismatched Parquet sources. |
Definition at line 509 of file parquet.hpp.
|
inline |
move parquet_reader_options member once it's built.
This has been added since Cython does not support overloading of conversion operators.
parquet_reader_options object's r-value reference Definition at line 614 of file parquet.hpp.
|
inline |
Sets names of the columns to be read.
| col_names | Vector of column names |
Definition at line 436 of file parquet.hpp.
|
inline |
Sets enable/disable conversion of strings to categories.
| val | Boolean value to enable/disable conversion of string columns to categories |
Definition at line 470 of file parquet.hpp.
|
inline |
Sets AST based filter for predicate pushdown.
The filter can utilize cudf::ast::column_name_reference to reference a column by its name, even if it's not necessarily present in the requested projected columns. To refer to output column indices, you can use cudf::ast::column_reference.
For a parquet with columns ["A", "B", "C", ... "X", "Y", "Z"], Example 1: with/without column projection
Column "C" need not be present in output table. Example 2: without column projection
Here, 1 will refer to column "B" because output will contain all columns in order ["A", ..., "Z"]. Example 3: with column projection
Here, 1 will refer to column "Z" because output will contain 3 columns in order ["A", "Z", "X"].
| filter | AST expression to use as filter |
Definition at line 458 of file parquet.hpp.
|
inline |
Sets number of bytes after skipping to end reading row groups at.
| val | Number of bytes after skipping to end reading row groups at |
Definition at line 572 of file parquet.hpp.
|
inline |
Sets number of rows to read.
size_type::max() rows, if any single read would produce a table larger than this row limit, an error is thrown.| val | Number of rows to read after skip |
Definition at line 548 of file parquet.hpp.
|
inline |
Sets vector of individual row groups to read.
| row_groups | Vector of row groups to read |
Definition at line 448 of file parquet.hpp.
|
inline |
Sets reader metadata.
| val | Tree of metadata information. |
Definition at line 521 of file parquet.hpp.
|
inline |
Sets bytes to skip before starting reading row groups.
| val | Bytes to skip before starting reading row groups |
Definition at line 560 of file parquet.hpp.
|
inline |
Sets number of rows to skip.
| val | Number of rows to skip from start |
Definition at line 533 of file parquet.hpp.
|
inline |
timestamp_type used to cast timestamp columns.
| type | The timestamp data_type to which all timestamp columns need to be cast |
Definition at line 584 of file parquet.hpp.
|
inline |
Sets to enable/disable use of arrow schema to read.
| val | Boolean value whether to use arrow schema |
Definition at line 494 of file parquet.hpp.
|
inline |
Enable/disable use of JIT for filter step.
| use_jit_filter | Boolean value whether to use JIT filter |
Definition at line 596 of file parquet.hpp.
|
inline |
Sets to enable/disable use of pandas metadata to read.
| val | Boolean value whether to use pandas metadata |
Definition at line 482 of file parquet.hpp.