Public Member Functions | List of all members
cudf::io::parquet_reader_options_builder Class Reference

Builds parquet_reader_options to use for read_parquet(). More...

#include <parquet.hpp>

Public Member Functions

 parquet_reader_options_builder ()=default
 Default constructor. More...
 
 parquet_reader_options_builder (source_info src)
 Constructor from source info. More...
 
parquet_reader_options_buildercolumns (std::vector< std::string > col_names)
 Sets names of the columns to be read. More...
 
parquet_reader_options_builderrow_groups (std::vector< std::vector< size_type >> row_groups)
 Sets vector of individual row groups to read. More...
 
parquet_reader_options_builderfilter (ast::expression const &filter)
 Sets AST based filter for predicate pushdown. More...
 
parquet_reader_options_builderconvert_strings_to_categories (bool val)
 Sets enable/disable conversion of strings to categories. More...
 
parquet_reader_options_builderuse_pandas_metadata (bool val)
 Sets to enable/disable use of pandas metadata to read. More...
 
parquet_reader_options_builderuse_arrow_schema (bool val)
 Sets to enable/disable use of arrow schema to read. More...
 
parquet_reader_options_builderallow_mismatched_pq_schemas (bool val)
 Sets to enable/disable reading of matching projected and filter columns from mismatched Parquet sources. More...
 
parquet_reader_options_builderset_column_schema (std::vector< reader_column_schema > val)
 Sets reader metadata. More...
 
parquet_reader_options_builderskip_rows (int64_t val)
 Sets number of rows to skip. More...
 
parquet_reader_options_buildernum_rows (size_type val)
 Sets number of rows to read. More...
 
parquet_reader_options_buildertimestamp_type (data_type type)
 timestamp_type used to cast timestamp columns. More...
 
 operator parquet_reader_options && ()
 move parquet_reader_options member once it's built.
 
parquet_reader_options && build ()
 move parquet_reader_options member once it's built. More...
 

Detailed Description

Builds parquet_reader_options to use for read_parquet().

Definition at line 319 of file parquet.hpp.

Constructor & Destructor Documentation

◆ parquet_reader_options_builder() [1/2]

cudf::io::parquet_reader_options_builder::parquet_reader_options_builder ( )
default

Default constructor.

This has been added since Cython requires a default constructor to create objects on stack.

◆ parquet_reader_options_builder() [2/2]

cudf::io::parquet_reader_options_builder::parquet_reader_options_builder ( source_info  src)
inlineexplicit

Constructor from source info.

Parameters
srcThe source information used to read parquet file

Definition at line 335 of file parquet.hpp.

Member Function Documentation

◆ allow_mismatched_pq_schemas()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::allow_mismatched_pq_schemas ( bool  val)
inline

Sets to enable/disable reading of matching projected and filter columns from mismatched Parquet sources.

Parameters
valBoolean value whether to read matching projected and filter columns from mismatched Parquet sources.
Returns
this for chaining.

Definition at line 416 of file parquet.hpp.

◆ build()

parquet_reader_options&& cudf::io::parquet_reader_options_builder::build ( )
inline

move parquet_reader_options member once it's built.

This has been added since Cython does not support overloading of conversion operators.

Returns
Built parquet_reader_options object's r-value reference

Definition at line 482 of file parquet.hpp.

◆ columns()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::columns ( std::vector< std::string >  col_names)
inline

Sets names of the columns to be read.

Parameters
col_namesVector of column names
Returns
this for chaining

Definition at line 343 of file parquet.hpp.

◆ convert_strings_to_categories()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::convert_strings_to_categories ( bool  val)
inline

Sets enable/disable conversion of strings to categories.

Parameters
valBoolean value to enable/disable conversion of string columns to categories
Returns
this for chaining

Definition at line 377 of file parquet.hpp.

◆ filter()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::filter ( ast::expression const &  filter)
inline

Sets AST based filter for predicate pushdown.

The filter can utilize cudf::ast::column_name_reference to reference a column by its name, even if it's not necessarily present in the requested projected columns. To refer to output column indices, you can use cudf::ast::column_reference.

For a parquet with columns ["A", "B", "C", ... "X", "Y", "Z"], Example 1: with/without column projection

use_columns({"A", "X", "Z"})
.filter(operation(ast_operator::LESS, column_name_reference{"C"}, literal{100}));
parquet_reader_options_builder & filter(ast::expression const &filter)
Sets AST based filter for predicate pushdown.
Definition: parquet.hpp:365

Column "C" need not be present in output table. Example 2: without column projection

filter(operation(ast_operator::LESS, column_reference{1}, literal{100}));

Here, 1 will refer to column "B" because output will contain all columns in order ["A", ..., "Z"]. Example 3: with column projection

use_columns({"A", "Z", "X"})
.filter(operation(ast_operator::LESS, column_reference{1}, literal{100}));

Here, 1 will refer to column "Z" because output will contain 3 columns in order ["A", "Z", "X"].

Parameters
filterAST expression to use as filter
Returns
this for chaining

Definition at line 365 of file parquet.hpp.

◆ num_rows()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::num_rows ( size_type  val)
inline

Sets number of rows to read.

Parameters
valNumber of rows to read after skip
Returns
this for chaining

Definition at line 452 of file parquet.hpp.

◆ row_groups()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::row_groups ( std::vector< std::vector< size_type >>  row_groups)
inline

Sets vector of individual row groups to read.

Parameters
row_groupsVector of row groups to read
Returns
this for chaining

Definition at line 355 of file parquet.hpp.

◆ set_column_schema()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::set_column_schema ( std::vector< reader_column_schema val)
inline

Sets reader metadata.

Parameters
valTree of metadata information.
Returns
this for chaining

Definition at line 428 of file parquet.hpp.

◆ skip_rows()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::skip_rows ( int64_t  val)
inline

Sets number of rows to skip.

Parameters
valNumber of rows to skip from start
Returns
this for chaining

Definition at line 440 of file parquet.hpp.

◆ timestamp_type()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::timestamp_type ( data_type  type)
inline

timestamp_type used to cast timestamp columns.

Parameters
typeThe timestamp data_type to which all timestamp columns need to be cast
Returns
this for chaining

Definition at line 464 of file parquet.hpp.

◆ use_arrow_schema()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::use_arrow_schema ( bool  val)
inline

Sets to enable/disable use of arrow schema to read.

Parameters
valBoolean value whether to use arrow schema
Returns
this for chaining

Definition at line 401 of file parquet.hpp.

◆ use_pandas_metadata()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::use_pandas_metadata ( bool  val)
inline

Sets to enable/disable use of pandas metadata to read.

Parameters
valBoolean value whether to use pandas metadata
Returns
this for chaining

Definition at line 389 of file parquet.hpp.


The documentation for this class was generated from the following file: