Builds parquet_reader_options to use for read_parquet()
.
More...
#include <parquet.hpp>
Public Member Functions | |
parquet_reader_options_builder ()=default | |
Default constructor. More... | |
parquet_reader_options_builder (source_info src) | |
Constructor from source info. More... | |
parquet_reader_options_builder & | columns (std::vector< std::string > col_names) |
Sets names of the columns to be read. More... | |
parquet_reader_options_builder & | row_groups (std::vector< std::vector< size_type >> row_groups) |
Sets vector of individual row groups to read. More... | |
parquet_reader_options_builder & | filter (ast::expression const &filter) |
Sets AST based filter for predicate pushdown. More... | |
parquet_reader_options_builder & | convert_strings_to_categories (bool val) |
Sets enable/disable conversion of strings to categories. More... | |
parquet_reader_options_builder & | use_pandas_metadata (bool val) |
Sets to enable/disable use of pandas metadata to read. More... | |
parquet_reader_options_builder & | use_arrow_schema (bool val) |
Sets to enable/disable use of arrow schema to read. More... | |
parquet_reader_options_builder & | set_column_schema (std::vector< reader_column_schema > val) |
Sets reader metadata. More... | |
parquet_reader_options_builder & | skip_rows (int64_t val) |
Sets number of rows to skip. More... | |
parquet_reader_options_builder & | num_rows (size_type val) |
Sets number of rows to read. More... | |
parquet_reader_options_builder & | timestamp_type (data_type type) |
timestamp_type used to cast timestamp columns. More... | |
operator parquet_reader_options && () | |
move parquet_reader_options member once it's built. | |
parquet_reader_options && | build () |
move parquet_reader_options member once it's built. More... | |
Builds parquet_reader_options to use for read_parquet()
.
Definition at line 297 of file parquet.hpp.
|
default |
Default constructor.
This has been added since Cython requires a default constructor to create objects on stack.
|
inlineexplicit |
Constructor from source info.
src | The source information used to read parquet file |
Definition at line 313 of file parquet.hpp.
|
inline |
move parquet_reader_options member once it's built.
This has been added since Cython does not support overloading of conversion operators.
parquet_reader_options
object's r-value reference Definition at line 445 of file parquet.hpp.
|
inline |
Sets names of the columns to be read.
col_names | Vector of column names |
Definition at line 321 of file parquet.hpp.
|
inline |
Sets enable/disable conversion of strings to categories.
val | Boolean value to enable/disable conversion of string columns to categories |
Definition at line 355 of file parquet.hpp.
|
inline |
Sets AST based filter for predicate pushdown.
The filter can utilize cudf::ast::column_name_reference to reference a column by its name, even if it's not necessarily present in the requested projected columns. To refer to output column indices, you can use cudf::ast::column_reference.
For a parquet with columns ["A", "B", "C", ... "X", "Y", "Z"], Example 1: with/without column projection
Column "C" need not be present in output table. Example 2: without column projection
Here, 1
will refer to column "B" because output will contain all columns in order ["A", ..., "Z"]. Example 3: with column projection
Here, 1
will refer to column "Z" because output will contain 3 columns in order ["A", "Z", "X"].
filter | AST expression to use as filter |
Definition at line 343 of file parquet.hpp.
|
inline |
Sets number of rows to read.
val | Number of rows to read after skip |
Definition at line 415 of file parquet.hpp.
|
inline |
Sets vector of individual row groups to read.
row_groups | Vector of row groups to read |
Definition at line 333 of file parquet.hpp.
|
inline |
Sets reader metadata.
val | Tree of metadata information. |
Definition at line 391 of file parquet.hpp.
|
inline |
Sets number of rows to skip.
val | Number of rows to skip from start |
Definition at line 403 of file parquet.hpp.
|
inline |
timestamp_type used to cast timestamp columns.
type | The timestamp data_type to which all timestamp columns need to be cast |
Definition at line 427 of file parquet.hpp.
|
inline |
Sets to enable/disable use of arrow schema to read.
val | Boolean value whether to use arrow schema |
Definition at line 379 of file parquet.hpp.
|
inline |
Sets to enable/disable use of pandas metadata to read.
val | Boolean value whether to use pandas metadata |
Definition at line 367 of file parquet.hpp.