Public Member Functions | Static Public Member Functions | List of all members
cudf::io::parquet_reader_options Class Reference

Settings for read_parquet(). More...

#include <parquet.hpp>

Public Member Functions

 parquet_reader_options ()=default
 Default constructor. More...
 
source_info const & get_source () const
 Returns source info. More...
 
bool is_enabled_convert_strings_to_categories () const
 Returns true/false depending on whether strings should be converted to categories or not. More...
 
bool is_enabled_use_pandas_metadata () const
 Returns true/false depending whether to use pandas metadata or not while reading. More...
 
bool is_enabled_use_arrow_schema () const
 Returns true/false depending whether to use arrow schema while reading. More...
 
std::optional< std::vector< reader_column_schema > > get_column_schema () const
 Returns optional tree of metadata. More...
 
int64_t get_skip_rows () const
 Returns number of rows to skip from the start. More...
 
std::optional< size_type > const & get_num_rows () const
 Returns number of rows to read. More...
 
auto const & get_columns () const
 Returns names of column to be read, if set. More...
 
auto const & get_row_groups () const
 Returns list of individual row groups to be read. More...
 
auto const & get_filter () const
 Returns AST based filter for predicate pushdown. More...
 
data_type get_timestamp_type () const
 Returns timestamp type used to cast timestamp columns. More...
 
void set_columns (std::vector< std::string > col_names)
 Sets names of the columns to be read. More...
 
void set_row_groups (std::vector< std::vector< size_type >> row_groups)
 Sets vector of individual row groups to read. More...
 
void set_filter (ast::expression const &filter)
 Sets AST based filter for predicate pushdown. More...
 
void enable_convert_strings_to_categories (bool val)
 Sets to enable/disable conversion of strings to categories. More...
 
void enable_use_pandas_metadata (bool val)
 Sets to enable/disable use of pandas metadata to read. More...
 
void enable_use_arrow_schema (bool val)
 Sets to enable/disable use of arrow schema to read. More...
 
void set_column_schema (std::vector< reader_column_schema > val)
 Sets reader column schema. More...
 
void set_skip_rows (int64_t val)
 Sets number of rows to skip. More...
 
void set_num_rows (size_type val)
 Sets number of rows to read. More...
 
void set_timestamp_type (data_type type)
 Sets timestamp_type used to cast timestamp columns. More...
 

Static Public Member Functions

static parquet_reader_options_builder builder (source_info src)
 Creates a parquet_reader_options_builder which will build parquet_reader_options. More...
 

Detailed Description

Settings for read_parquet().

Definition at line 57 of file parquet.hpp.

Constructor & Destructor Documentation

◆ parquet_reader_options()

cudf::io::parquet_reader_options::parquet_reader_options ( )
explicitdefault

Default constructor.

This has been added since Cython requires a default constructor to create objects on stack.

Member Function Documentation

◆ builder()

static parquet_reader_options_builder cudf::io::parquet_reader_options::builder ( source_info  src)
static

Creates a parquet_reader_options_builder which will build parquet_reader_options.

Parameters
srcSource information to read parquet file
Returns
Builder to build reader options

◆ enable_convert_strings_to_categories()

void cudf::io::parquet_reader_options::enable_convert_strings_to_categories ( bool  val)
inline

Sets to enable/disable conversion of strings to categories.

Parameters
valBoolean value to enable/disable conversion of string columns to categories

Definition at line 245 of file parquet.hpp.

◆ enable_use_arrow_schema()

void cudf::io::parquet_reader_options::enable_use_arrow_schema ( bool  val)
inline

Sets to enable/disable use of arrow schema to read.

Parameters
valBoolean value whether to use arrow schema

Definition at line 259 of file parquet.hpp.

◆ enable_use_pandas_metadata()

void cudf::io::parquet_reader_options::enable_use_pandas_metadata ( bool  val)
inline

Sets to enable/disable use of pandas metadata to read.

Parameters
valBoolean value whether to use pandas metadata

Definition at line 252 of file parquet.hpp.

◆ get_column_schema()

std::optional<std::vector<reader_column_schema> > cudf::io::parquet_reader_options::get_column_schema ( ) const
inline

Returns optional tree of metadata.

Returns
vector of reader_column_schema objects.

Definition at line 146 of file parquet.hpp.

◆ get_columns()

auto const& cudf::io::parquet_reader_options::get_columns ( ) const
inline

Returns names of column to be read, if set.

Returns
Names of column to be read; nullopt if the option is not set

Definition at line 171 of file parquet.hpp.

◆ get_filter()

auto const& cudf::io::parquet_reader_options::get_filter ( ) const
inline

Returns AST based filter for predicate pushdown.

Returns
AST expression to use as filter

Definition at line 185 of file parquet.hpp.

◆ get_num_rows()

std::optional<size_type> const& cudf::io::parquet_reader_options::get_num_rows ( ) const
inline

Returns number of rows to read.

Returns
Number of rows to read; nullopt if the option hasn't been set (in which case the file is read until the end)

Definition at line 164 of file parquet.hpp.

◆ get_row_groups()

auto const& cudf::io::parquet_reader_options::get_row_groups ( ) const
inline

Returns list of individual row groups to be read.

Returns
List of individual row groups to be read

Definition at line 178 of file parquet.hpp.

◆ get_skip_rows()

int64_t cudf::io::parquet_reader_options::get_skip_rows ( ) const
inline

Returns number of rows to skip from the start.

Returns
Number of rows to skip from the start

Definition at line 156 of file parquet.hpp.

◆ get_source()

source_info const& cudf::io::parquet_reader_options::get_source ( ) const
inline

Returns source info.

Returns
Source info

Definition at line 114 of file parquet.hpp.

◆ get_timestamp_type()

data_type cudf::io::parquet_reader_options::get_timestamp_type ( ) const
inline

Returns timestamp type used to cast timestamp columns.

Returns
Timestamp type used to cast timestamp columns

Definition at line 192 of file parquet.hpp.

◆ is_enabled_convert_strings_to_categories()

bool cudf::io::parquet_reader_options::is_enabled_convert_strings_to_categories ( ) const
inline

Returns true/false depending on whether strings should be converted to categories or not.

Returns
true if strings should be converted to categories

Definition at line 122 of file parquet.hpp.

◆ is_enabled_use_arrow_schema()

bool cudf::io::parquet_reader_options::is_enabled_use_arrow_schema ( ) const
inline

Returns true/false depending whether to use arrow schema while reading.

Returns
true if arrow schema is used while reading

Definition at line 139 of file parquet.hpp.

◆ is_enabled_use_pandas_metadata()

bool cudf::io::parquet_reader_options::is_enabled_use_pandas_metadata ( ) const
inline

Returns true/false depending whether to use pandas metadata or not while reading.

Returns
true if pandas metadata is used while reading

Definition at line 132 of file parquet.hpp.

◆ set_column_schema()

void cudf::io::parquet_reader_options::set_column_schema ( std::vector< reader_column_schema val)
inline

Sets reader column schema.

Parameters
valTree of schema nodes to enable/disable conversion of binary to string columns. Note default is to convert to string columns.

Definition at line 267 of file parquet.hpp.

◆ set_columns()

void cudf::io::parquet_reader_options::set_columns ( std::vector< std::string >  col_names)
inline

Sets names of the columns to be read.

Parameters
col_namesVector of column names

Definition at line 199 of file parquet.hpp.

◆ set_filter()

void cudf::io::parquet_reader_options::set_filter ( ast::expression const &  filter)
inline

Sets AST based filter for predicate pushdown.

The filter can utilize cudf::ast::column_name_reference to reference a column by its name, even if it's not necessarily present in the requested projected columns. To refer to output column indices, you can use cudf::ast::column_reference.

For a parquet with columns ["A", "B", "C", ... "X", "Y", "Z"], Example 1: with/without column projection

use_columns({"A", "X", "Z"})
.filter(operation(ast_operator::LESS, column_name_reference{"C"}, literal{100}));

Column "C" need not be present in output table. Example 2: without column projection

filter(operation(ast_operator::LESS, column_reference{1}, literal{100}));

Here, 1 will refer to column "B" because output will contain all columns in order ["A", ..., "Z"]. Example 3: with column projection

use_columns({"A", "Z", "X"})
.filter(operation(ast_operator::LESS, column_reference{1}, literal{100}));

Here, 1 will refer to column "Z" because output will contain 3 columns in order ["A", "Z", "X"].

Parameters
filterAST expression to use as filter

Definition at line 238 of file parquet.hpp.

◆ set_num_rows()

void cudf::io::parquet_reader_options::set_num_rows ( size_type  val)

Sets number of rows to read.

Parameters
valNumber of rows to read after skip

◆ set_row_groups()

void cudf::io::parquet_reader_options::set_row_groups ( std::vector< std::vector< size_type >>  row_groups)

Sets vector of individual row groups to read.

Parameters
row_groupsVector of row groups to read

◆ set_skip_rows()

void cudf::io::parquet_reader_options::set_skip_rows ( int64_t  val)

Sets number of rows to skip.

Parameters
valNumber of rows to skip from start

◆ set_timestamp_type()

void cudf::io::parquet_reader_options::set_timestamp_type ( data_type  type)
inline

Sets timestamp_type used to cast timestamp columns.

Parameters
typeThe timestamp data_type to which all timestamp columns need to be cast

Definition at line 291 of file parquet.hpp.


The documentation for this class was generated from the following file: