Settings for read_parquet()
.
More...
#include <parquet.hpp>
Public Member Functions | |
parquet_reader_options ()=default | |
Default constructor. More... | |
source_info const & | get_source () const |
Returns source info. More... | |
bool | is_enabled_convert_strings_to_categories () const |
Returns true/false depending on whether strings should be converted to categories or not. More... | |
bool | is_enabled_use_pandas_metadata () const |
Returns true/false depending whether to use pandas metadata or not while reading. More... | |
bool | is_enabled_use_arrow_schema () const |
Returns true/false depending whether to use arrow schema while reading. More... | |
bool | is_enabled_allow_mismatched_pq_schemas () const |
Returns true/false depending on whether to read matching projected and filter columns from mismatched Parquet sources. More... | |
std::optional< std::vector< reader_column_schema > > | get_column_schema () const |
Returns optional tree of metadata. More... | |
int64_t | get_skip_rows () const |
Returns number of rows to skip from the start. More... | |
std::optional< size_type > const & | get_num_rows () const |
Returns number of rows to read. More... | |
auto const & | get_columns () const |
Returns names of column to be read, if set. More... | |
auto const & | get_row_groups () const |
Returns list of individual row groups to be read. More... | |
auto const & | get_filter () const |
Returns AST based filter for predicate pushdown. More... | |
data_type | get_timestamp_type () const |
Returns timestamp type used to cast timestamp columns. More... | |
void | set_columns (std::vector< std::string > col_names) |
Sets names of the columns to be read. More... | |
void | set_row_groups (std::vector< std::vector< size_type >> row_groups) |
Sets vector of individual row groups to read. More... | |
void | set_filter (ast::expression const &filter) |
Sets AST based filter for predicate pushdown. More... | |
void | enable_convert_strings_to_categories (bool val) |
Sets to enable/disable conversion of strings to categories. More... | |
void | enable_use_pandas_metadata (bool val) |
Sets to enable/disable use of pandas metadata to read. More... | |
void | enable_use_arrow_schema (bool val) |
Sets to enable/disable use of arrow schema to read. More... | |
void | enable_allow_mismatched_pq_schemas (bool val) |
Sets to enable/disable reading of matching projected and filter columns from mismatched Parquet sources. More... | |
void | set_column_schema (std::vector< reader_column_schema > val) |
Sets reader column schema. More... | |
void | set_skip_rows (int64_t val) |
Sets number of rows to skip. More... | |
void | set_num_rows (size_type val) |
Sets number of rows to read. More... | |
void | set_timestamp_type (data_type type) |
Sets timestamp_type used to cast timestamp columns. More... | |
Static Public Member Functions | |
static parquet_reader_options_builder | builder (source_info src) |
Creates a parquet_reader_options_builder which will build parquet_reader_options. More... | |
Settings for read_parquet()
.
Definition at line 56 of file parquet.hpp.
|
explicitdefault |
Default constructor.
This has been added since Cython requires a default constructor to create objects on stack.
|
static |
Creates a parquet_reader_options_builder which will build parquet_reader_options.
src | Source information to read parquet file |
|
inline |
Sets to enable/disable reading of matching projected and filter columns from mismatched Parquet sources.
val | Boolean value whether to read matching projected and filter columns from mismatched Parquet sources. |
Definition at line 281 of file parquet.hpp.
|
inline |
Sets to enable/disable conversion of strings to categories.
val | Boolean value to enable/disable conversion of string columns to categories |
Definition at line 258 of file parquet.hpp.
|
inline |
Sets to enable/disable use of arrow schema to read.
val | Boolean value whether to use arrow schema |
Definition at line 272 of file parquet.hpp.
|
inline |
Sets to enable/disable use of pandas metadata to read.
val | Boolean value whether to use pandas metadata |
Definition at line 265 of file parquet.hpp.
|
inline |
Returns optional tree of metadata.
Definition at line 159 of file parquet.hpp.
|
inline |
Returns names of column to be read, if set.
nullopt
if the option is not set Definition at line 184 of file parquet.hpp.
|
inline |
Returns AST based filter for predicate pushdown.
Definition at line 198 of file parquet.hpp.
|
inline |
Returns number of rows to read.
nullopt
if the option hasn't been set (in which case the file is read until the end) Definition at line 177 of file parquet.hpp.
|
inline |
Returns list of individual row groups to be read.
Definition at line 191 of file parquet.hpp.
|
inline |
Returns number of rows to skip from the start.
Definition at line 169 of file parquet.hpp.
|
inline |
|
inline |
Returns timestamp type used to cast timestamp columns.
Definition at line 205 of file parquet.hpp.
|
inline |
Returns true/false depending on whether to read matching projected and filter columns from mismatched Parquet sources.
true
if mismatched projected and filter columns will be read from mismatched Parquet sources. Definition at line 149 of file parquet.hpp.
|
inline |
Returns true/false depending on whether strings should be converted to categories or not.
true
if strings should be converted to categories Definition at line 123 of file parquet.hpp.
|
inline |
Returns true/false depending whether to use arrow schema while reading.
true
if arrow schema is used while reading Definition at line 140 of file parquet.hpp.
|
inline |
Returns true/false depending whether to use pandas metadata or not while reading.
true
if pandas metadata is used while reading Definition at line 133 of file parquet.hpp.
|
inline |
Sets reader column schema.
val | Tree of schema nodes to enable/disable conversion of binary to string columns. Note default is to convert to string columns. |
Definition at line 289 of file parquet.hpp.
|
inline |
Sets names of the columns to be read.
col_names | Vector of column names |
Definition at line 212 of file parquet.hpp.
|
inline |
Sets AST based filter for predicate pushdown.
The filter can utilize cudf::ast::column_name_reference to reference a column by its name, even if it's not necessarily present in the requested projected columns. To refer to output column indices, you can use cudf::ast::column_reference.
For a parquet with columns ["A", "B", "C", ... "X", "Y", "Z"], Example 1: with/without column projection
Column "C" need not be present in output table. Example 2: without column projection
Here, 1
will refer to column "B" because output will contain all columns in order ["A", ..., "Z"]. Example 3: with column projection
Here, 1
will refer to column "Z" because output will contain 3 columns in order ["A", "Z", "X"].
filter | AST expression to use as filter |
Definition at line 251 of file parquet.hpp.
void cudf::io::parquet_reader_options::set_num_rows | ( | size_type | val | ) |
Sets number of rows to read.
val | Number of rows to read after skip |
void cudf::io::parquet_reader_options::set_row_groups | ( | std::vector< std::vector< size_type >> | row_groups | ) |
Sets vector of individual row groups to read.
row_groups | Vector of row groups to read |
void cudf::io::parquet_reader_options::set_skip_rows | ( | int64_t | val | ) |
Sets number of rows to skip.
val | Number of rows to skip from start |
|
inline |
Sets timestamp_type used to cast timestamp columns.
type | The timestamp data_type to which all timestamp columns need to be cast |
Definition at line 313 of file parquet.hpp.