Settings for read_parquet()
.
More...
#include <parquet.hpp>
Public Member Functions | |
parquet_reader_options ()=default | |
Default constructor. More... | |
source_info const & | get_source () const |
Returns source info. More... | |
bool | is_enabled_convert_strings_to_categories () const |
Returns true/false depending on whether strings should be converted to categories or not. More... | |
bool | is_enabled_use_pandas_metadata () const |
Returns true/false depending whether to use pandas metadata or not while reading. More... | |
bool | is_enabled_use_arrow_schema () const |
Returns true/false depending whether to use arrow schema while reading. More... | |
std::optional< std::vector< reader_column_schema > > | get_column_schema () const |
Returns optional tree of metadata. More... | |
int64_t | get_skip_rows () const |
Returns number of rows to skip from the start. More... | |
std::optional< size_type > const & | get_num_rows () const |
Returns number of rows to read. More... | |
auto const & | get_columns () const |
Returns names of column to be read, if set. More... | |
auto const & | get_row_groups () const |
Returns list of individual row groups to be read. More... | |
auto const & | get_filter () const |
Returns AST based filter for predicate pushdown. More... | |
data_type | get_timestamp_type () const |
Returns timestamp type used to cast timestamp columns. More... | |
void | set_columns (std::vector< std::string > col_names) |
Sets names of the columns to be read. More... | |
void | set_row_groups (std::vector< std::vector< size_type >> row_groups) |
Sets vector of individual row groups to read. More... | |
void | set_filter (ast::expression const &filter) |
Sets AST based filter for predicate pushdown. More... | |
void | enable_convert_strings_to_categories (bool val) |
Sets to enable/disable conversion of strings to categories. More... | |
void | enable_use_pandas_metadata (bool val) |
Sets to enable/disable use of pandas metadata to read. More... | |
void | enable_use_arrow_schema (bool val) |
Sets to enable/disable use of arrow schema to read. More... | |
void | set_column_schema (std::vector< reader_column_schema > val) |
Sets reader column schema. More... | |
void | set_skip_rows (int64_t val) |
Sets number of rows to skip. More... | |
void | set_num_rows (size_type val) |
Sets number of rows to read. More... | |
void | set_timestamp_type (data_type type) |
Sets timestamp_type used to cast timestamp columns. More... | |
Static Public Member Functions | |
static parquet_reader_options_builder | builder (source_info src) |
Creates a parquet_reader_options_builder which will build parquet_reader_options. More... | |
Settings for read_parquet()
.
Definition at line 57 of file parquet.hpp.
|
explicitdefault |
Default constructor.
This has been added since Cython requires a default constructor to create objects on stack.
|
static |
Creates a parquet_reader_options_builder which will build parquet_reader_options.
src | Source information to read parquet file |
|
inline |
Sets to enable/disable conversion of strings to categories.
val | Boolean value to enable/disable conversion of string columns to categories |
Definition at line 245 of file parquet.hpp.
|
inline |
Sets to enable/disable use of arrow schema to read.
val | Boolean value whether to use arrow schema |
Definition at line 259 of file parquet.hpp.
|
inline |
Sets to enable/disable use of pandas metadata to read.
val | Boolean value whether to use pandas metadata |
Definition at line 252 of file parquet.hpp.
|
inline |
Returns optional tree of metadata.
Definition at line 146 of file parquet.hpp.
|
inline |
Returns names of column to be read, if set.
nullopt
if the option is not set Definition at line 171 of file parquet.hpp.
|
inline |
Returns AST based filter for predicate pushdown.
Definition at line 185 of file parquet.hpp.
|
inline |
Returns number of rows to read.
nullopt
if the option hasn't been set (in which case the file is read until the end) Definition at line 164 of file parquet.hpp.
|
inline |
Returns list of individual row groups to be read.
Definition at line 178 of file parquet.hpp.
|
inline |
Returns number of rows to skip from the start.
Definition at line 156 of file parquet.hpp.
|
inline |
|
inline |
Returns timestamp type used to cast timestamp columns.
Definition at line 192 of file parquet.hpp.
|
inline |
Returns true/false depending on whether strings should be converted to categories or not.
true
if strings should be converted to categories Definition at line 122 of file parquet.hpp.
|
inline |
Returns true/false depending whether to use arrow schema while reading.
true
if arrow schema is used while reading Definition at line 139 of file parquet.hpp.
|
inline |
Returns true/false depending whether to use pandas metadata or not while reading.
true
if pandas metadata is used while reading Definition at line 132 of file parquet.hpp.
|
inline |
Sets reader column schema.
val | Tree of schema nodes to enable/disable conversion of binary to string columns. Note default is to convert to string columns. |
Definition at line 267 of file parquet.hpp.
|
inline |
Sets names of the columns to be read.
col_names | Vector of column names |
Definition at line 199 of file parquet.hpp.
|
inline |
Sets AST based filter for predicate pushdown.
The filter can utilize cudf::ast::column_name_reference to reference a column by its name, even if it's not necessarily present in the requested projected columns. To refer to output column indices, you can use cudf::ast::column_reference.
For a parquet with columns ["A", "B", "C", ... "X", "Y", "Z"], Example 1: with/without column projection
Column "C" need not be present in output table. Example 2: without column projection
Here, 1
will refer to column "B" because output will contain all columns in order ["A", ..., "Z"]. Example 3: with column projection
Here, 1
will refer to column "Z" because output will contain 3 columns in order ["A", "Z", "X"].
filter | AST expression to use as filter |
Definition at line 238 of file parquet.hpp.
void cudf::io::parquet_reader_options::set_num_rows | ( | size_type | val | ) |
Sets number of rows to read.
val | Number of rows to read after skip |
void cudf::io::parquet_reader_options::set_row_groups | ( | std::vector< std::vector< size_type >> | row_groups | ) |
Sets vector of individual row groups to read.
row_groups | Vector of row groups to read |
void cudf::io::parquet_reader_options::set_skip_rows | ( | int64_t | val | ) |
Sets number of rows to skip.
val | Number of rows to skip from start |
|
inline |
Sets timestamp_type used to cast timestamp columns.
type | The timestamp data_type to which all timestamp columns need to be cast |
Definition at line 291 of file parquet.hpp.