Public Member Functions | Static Public Member Functions | List of all members
cudf::io::parquet_reader_options Class Reference

Settings for read_parquet(). More...

#include <parquet.hpp>

Public Member Functions

 parquet_reader_options ()=default
 Default constructor. More...
 
source_info const & get_source () const
 Returns source info. More...
 
bool is_enabled_convert_strings_to_categories () const
 Returns true/false depending on whether strings should be converted to categories or not. More...
 
bool is_enabled_use_pandas_metadata () const
 Returns true/false depending whether to use pandas metadata or not while reading. More...
 
bool is_enabled_use_arrow_schema () const
 Returns true/false depending whether to use arrow schema while reading. More...
 
bool is_enabled_allow_mismatched_pq_schemas () const
 Returns true/false depending on whether to read matching projected and filter columns from mismatched Parquet sources. More...
 
std::optional< std::vector< reader_column_schema > > get_column_schema () const
 Returns optional tree of metadata. More...
 
int64_t get_skip_rows () const
 Returns number of rows to skip from the start. More...
 
std::optional< size_type > const & get_num_rows () const
 Returns number of rows to read. More...
 
auto const & get_columns () const
 Returns names of column to be read, if set. More...
 
auto const & get_row_groups () const
 Returns list of individual row groups to be read. More...
 
auto const & get_filter () const
 Returns AST based filter for predicate pushdown. More...
 
data_type get_timestamp_type () const
 Returns timestamp type used to cast timestamp columns. More...
 
void set_columns (std::vector< std::string > col_names)
 Sets names of the columns to be read. More...
 
void set_row_groups (std::vector< std::vector< size_type >> row_groups)
 Sets vector of individual row groups to read. More...
 
void set_filter (ast::expression const &filter)
 Sets AST based filter for predicate pushdown. More...
 
void enable_convert_strings_to_categories (bool val)
 Sets to enable/disable conversion of strings to categories. More...
 
void enable_use_pandas_metadata (bool val)
 Sets to enable/disable use of pandas metadata to read. More...
 
void enable_use_arrow_schema (bool val)
 Sets to enable/disable use of arrow schema to read. More...
 
void enable_allow_mismatched_pq_schemas (bool val)
 Sets to enable/disable reading of matching projected and filter columns from mismatched Parquet sources. More...
 
void set_column_schema (std::vector< reader_column_schema > val)
 Sets reader column schema. More...
 
void set_skip_rows (int64_t val)
 Sets number of rows to skip. More...
 
void set_num_rows (size_type val)
 Sets number of rows to read. More...
 
void set_timestamp_type (data_type type)
 Sets timestamp_type used to cast timestamp columns. More...
 

Static Public Member Functions

static parquet_reader_options_builder builder (source_info src)
 Creates a parquet_reader_options_builder which will build parquet_reader_options. More...
 

Detailed Description

Settings for read_parquet().

Definition at line 56 of file parquet.hpp.

Constructor & Destructor Documentation

◆ parquet_reader_options()

cudf::io::parquet_reader_options::parquet_reader_options ( )
explicitdefault

Default constructor.

This has been added since Cython requires a default constructor to create objects on stack.

Member Function Documentation

◆ builder()

static parquet_reader_options_builder cudf::io::parquet_reader_options::builder ( source_info  src)
static

Creates a parquet_reader_options_builder which will build parquet_reader_options.

Parameters
srcSource information to read parquet file
Returns
Builder to build reader options

◆ enable_allow_mismatched_pq_schemas()

void cudf::io::parquet_reader_options::enable_allow_mismatched_pq_schemas ( bool  val)
inline

Sets to enable/disable reading of matching projected and filter columns from mismatched Parquet sources.

Parameters
valBoolean value whether to read matching projected and filter columns from mismatched Parquet sources.

Definition at line 281 of file parquet.hpp.

◆ enable_convert_strings_to_categories()

void cudf::io::parquet_reader_options::enable_convert_strings_to_categories ( bool  val)
inline

Sets to enable/disable conversion of strings to categories.

Parameters
valBoolean value to enable/disable conversion of string columns to categories

Definition at line 258 of file parquet.hpp.

◆ enable_use_arrow_schema()

void cudf::io::parquet_reader_options::enable_use_arrow_schema ( bool  val)
inline

Sets to enable/disable use of arrow schema to read.

Parameters
valBoolean value whether to use arrow schema

Definition at line 272 of file parquet.hpp.

◆ enable_use_pandas_metadata()

void cudf::io::parquet_reader_options::enable_use_pandas_metadata ( bool  val)
inline

Sets to enable/disable use of pandas metadata to read.

Parameters
valBoolean value whether to use pandas metadata

Definition at line 265 of file parquet.hpp.

◆ get_column_schema()

std::optional<std::vector<reader_column_schema> > cudf::io::parquet_reader_options::get_column_schema ( ) const
inline

Returns optional tree of metadata.

Returns
vector of reader_column_schema objects.

Definition at line 159 of file parquet.hpp.

◆ get_columns()

auto const& cudf::io::parquet_reader_options::get_columns ( ) const
inline

Returns names of column to be read, if set.

Returns
Names of column to be read; nullopt if the option is not set

Definition at line 184 of file parquet.hpp.

◆ get_filter()

auto const& cudf::io::parquet_reader_options::get_filter ( ) const
inline

Returns AST based filter for predicate pushdown.

Returns
AST expression to use as filter

Definition at line 198 of file parquet.hpp.

◆ get_num_rows()

std::optional<size_type> const& cudf::io::parquet_reader_options::get_num_rows ( ) const
inline

Returns number of rows to read.

Returns
Number of rows to read; nullopt if the option hasn't been set (in which case the file is read until the end)

Definition at line 177 of file parquet.hpp.

◆ get_row_groups()

auto const& cudf::io::parquet_reader_options::get_row_groups ( ) const
inline

Returns list of individual row groups to be read.

Returns
List of individual row groups to be read

Definition at line 191 of file parquet.hpp.

◆ get_skip_rows()

int64_t cudf::io::parquet_reader_options::get_skip_rows ( ) const
inline

Returns number of rows to skip from the start.

Returns
Number of rows to skip from the start

Definition at line 169 of file parquet.hpp.

◆ get_source()

source_info const& cudf::io::parquet_reader_options::get_source ( ) const
inline

Returns source info.

Returns
Source info

Definition at line 115 of file parquet.hpp.

◆ get_timestamp_type()

data_type cudf::io::parquet_reader_options::get_timestamp_type ( ) const
inline

Returns timestamp type used to cast timestamp columns.

Returns
Timestamp type used to cast timestamp columns

Definition at line 205 of file parquet.hpp.

◆ is_enabled_allow_mismatched_pq_schemas()

bool cudf::io::parquet_reader_options::is_enabled_allow_mismatched_pq_schemas ( ) const
inline

Returns true/false depending on whether to read matching projected and filter columns from mismatched Parquet sources.

Returns
true if mismatched projected and filter columns will be read from mismatched Parquet sources.

Definition at line 149 of file parquet.hpp.

◆ is_enabled_convert_strings_to_categories()

bool cudf::io::parquet_reader_options::is_enabled_convert_strings_to_categories ( ) const
inline

Returns true/false depending on whether strings should be converted to categories or not.

Returns
true if strings should be converted to categories

Definition at line 123 of file parquet.hpp.

◆ is_enabled_use_arrow_schema()

bool cudf::io::parquet_reader_options::is_enabled_use_arrow_schema ( ) const
inline

Returns true/false depending whether to use arrow schema while reading.

Returns
true if arrow schema is used while reading

Definition at line 140 of file parquet.hpp.

◆ is_enabled_use_pandas_metadata()

bool cudf::io::parquet_reader_options::is_enabled_use_pandas_metadata ( ) const
inline

Returns true/false depending whether to use pandas metadata or not while reading.

Returns
true if pandas metadata is used while reading

Definition at line 133 of file parquet.hpp.

◆ set_column_schema()

void cudf::io::parquet_reader_options::set_column_schema ( std::vector< reader_column_schema val)
inline

Sets reader column schema.

Parameters
valTree of schema nodes to enable/disable conversion of binary to string columns. Note default is to convert to string columns.

Definition at line 289 of file parquet.hpp.

◆ set_columns()

void cudf::io::parquet_reader_options::set_columns ( std::vector< std::string >  col_names)
inline

Sets names of the columns to be read.

Parameters
col_namesVector of column names

Definition at line 212 of file parquet.hpp.

◆ set_filter()

void cudf::io::parquet_reader_options::set_filter ( ast::expression const &  filter)
inline

Sets AST based filter for predicate pushdown.

The filter can utilize cudf::ast::column_name_reference to reference a column by its name, even if it's not necessarily present in the requested projected columns. To refer to output column indices, you can use cudf::ast::column_reference.

For a parquet with columns ["A", "B", "C", ... "X", "Y", "Z"], Example 1: with/without column projection

use_columns({"A", "X", "Z"})
.filter(operation(ast_operator::LESS, column_name_reference{"C"}, literal{100}));

Column "C" need not be present in output table. Example 2: without column projection

filter(operation(ast_operator::LESS, column_reference{1}, literal{100}));

Here, 1 will refer to column "B" because output will contain all columns in order ["A", ..., "Z"]. Example 3: with column projection

use_columns({"A", "Z", "X"})
.filter(operation(ast_operator::LESS, column_reference{1}, literal{100}));

Here, 1 will refer to column "Z" because output will contain 3 columns in order ["A", "Z", "X"].

Parameters
filterAST expression to use as filter

Definition at line 251 of file parquet.hpp.

◆ set_num_rows()

void cudf::io::parquet_reader_options::set_num_rows ( size_type  val)

Sets number of rows to read.

Parameters
valNumber of rows to read after skip

◆ set_row_groups()

void cudf::io::parquet_reader_options::set_row_groups ( std::vector< std::vector< size_type >>  row_groups)

Sets vector of individual row groups to read.

Parameters
row_groupsVector of row groups to read

◆ set_skip_rows()

void cudf::io::parquet_reader_options::set_skip_rows ( int64_t  val)

Sets number of rows to skip.

Parameters
valNumber of rows to skip from start

◆ set_timestamp_type()

void cudf::io::parquet_reader_options::set_timestamp_type ( data_type  type)
inline

Sets timestamp_type used to cast timestamp columns.

Parameters
typeThe timestamp data_type to which all timestamp columns need to be cast

Definition at line 313 of file parquet.hpp.


The documentation for this class was generated from the following file: