Public Member Functions | List of all members
cudf::io::parquet_reader_options_builder Class Reference

Builds parquet_reader_options to use for read_parquet(). More...

#include <parquet.hpp>

Public Member Functions

 parquet_reader_options_builder ()=default
 Default constructor. More...
 
 parquet_reader_options_builder (source_info src)
 Constructor from source info. More...
 
parquet_reader_options_buildercolumns (std::vector< std::string > col_names)
 Sets names of the columns to be read. More...
 
parquet_reader_options_builderrow_groups (std::vector< std::vector< size_type >> row_groups)
 Sets vector of individual row groups to read. More...
 
parquet_reader_options_builderfilter (ast::expression const &filter)
 Sets AST based filter for predicate pushdown. More...
 
parquet_reader_options_builderconvert_strings_to_categories (bool val)
 Sets enable/disable conversion of strings to categories. More...
 
parquet_reader_options_builderuse_pandas_metadata (bool val)
 Sets to enable/disable use of pandas metadata to read. More...
 
parquet_reader_options_builderuse_arrow_schema (bool val)
 Sets to enable/disable use of arrow schema to read. More...
 
parquet_reader_options_builderallow_mismatched_pq_schemas (bool val)
 Sets to enable/disable reading of matching projected and filter columns from mismatched Parquet sources. More...
 
parquet_reader_options_builderset_column_schema (std::vector< reader_column_schema > val)
 Sets reader metadata. More...
 
parquet_reader_options_builderskip_rows (int64_t val)
 Sets number of rows to skip. More...
 
parquet_reader_options_buildernum_rows (size_type val)
 Sets number of rows to read. More...
 
parquet_reader_options_builderskip_bytes (size_t val)
 Sets bytes to skip before starting reading row groups. More...
 
parquet_reader_options_buildernum_bytes (size_t val)
 Sets number of bytes after skipping to end reading row groups at. More...
 
parquet_reader_options_buildertimestamp_type (data_type type)
 timestamp_type used to cast timestamp columns. More...
 
parquet_reader_options_builderuse_jit_filter (bool use_jit_filter)
 Enable/disable use of JIT for filter step. More...
 
 operator parquet_reader_options && ()
 move parquet_reader_options member once it's built.
 
parquet_reader_options && build ()
 move parquet_reader_options member once it's built. More...
 

Detailed Description

Builds parquet_reader_options to use for read_parquet().

Definition at line 412 of file parquet.hpp.

Constructor & Destructor Documentation

◆ parquet_reader_options_builder() [1/2]

cudf::io::parquet_reader_options_builder::parquet_reader_options_builder ( )
default

Default constructor.

This has been added since Cython requires a default constructor to create objects on stack. The hybrid_scan_reader also uses this to construct parquet_reader_options without a source.

◆ parquet_reader_options_builder() [2/2]

cudf::io::parquet_reader_options_builder::parquet_reader_options_builder ( source_info  src)
inlineexplicit

Constructor from source info.

Parameters
srcThe source information used to read parquet file

Definition at line 429 of file parquet.hpp.

Member Function Documentation

◆ allow_mismatched_pq_schemas()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::allow_mismatched_pq_schemas ( bool  val)
inline

Sets to enable/disable reading of matching projected and filter columns from mismatched Parquet sources.

Parameters
valBoolean value whether to read matching projected and filter columns from mismatched Parquet sources.
Returns
this for chaining.

Definition at line 510 of file parquet.hpp.

◆ build()

parquet_reader_options&& cudf::io::parquet_reader_options_builder::build ( )
inline

move parquet_reader_options member once it's built.

This has been added since Cython does not support overloading of conversion operators.

Returns
Built parquet_reader_options object's r-value reference

Definition at line 612 of file parquet.hpp.

◆ columns()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::columns ( std::vector< std::string >  col_names)
inline

Sets names of the columns to be read.

Parameters
col_namesVector of column names
Returns
this for chaining

Definition at line 437 of file parquet.hpp.

◆ convert_strings_to_categories()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::convert_strings_to_categories ( bool  val)
inline

Sets enable/disable conversion of strings to categories.

Parameters
valBoolean value to enable/disable conversion of string columns to categories
Returns
this for chaining

Definition at line 471 of file parquet.hpp.

◆ filter()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::filter ( ast::expression const &  filter)
inline

Sets AST based filter for predicate pushdown.

The filter can utilize cudf::ast::column_name_reference to reference a column by its name, even if it's not necessarily present in the requested projected columns. To refer to output column indices, you can use cudf::ast::column_reference.

For a parquet with columns ["A", "B", "C", ... "X", "Y", "Z"], Example 1: with/without column projection

use_columns({"A", "X", "Z"})
.filter(operation(ast_operator::LESS, column_name_reference{"C"}, literal{100}));
parquet_reader_options_builder & filter(ast::expression const &filter)
Sets AST based filter for predicate pushdown.
Definition: parquet.hpp:459

Column "C" need not be present in output table. Example 2: without column projection

filter(operation(ast_operator::LESS, column_reference{1}, literal{100}));

Here, 1 will refer to column "B" because output will contain all columns in order ["A", ..., "Z"]. Example 3: with column projection

use_columns({"A", "Z", "X"})
.filter(operation(ast_operator::LESS, column_reference{1}, literal{100}));

Here, 1 will refer to column "Z" because output will contain 3 columns in order ["A", "Z", "X"].

Parameters
filterAST expression to use as filter
Returns
this for chaining

Definition at line 459 of file parquet.hpp.

◆ num_bytes()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::num_bytes ( size_t  val)
inline

Sets number of bytes after skipping to end reading row groups at.

Parameters
valNumber of bytes after skipping to end reading row groups at
Returns
this for chaining

Definition at line 570 of file parquet.hpp.

◆ num_rows()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::num_rows ( size_type  val)
inline

Sets number of rows to read.

Parameters
valNumber of rows to read after skip
Returns
this for chaining

Definition at line 546 of file parquet.hpp.

◆ row_groups()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::row_groups ( std::vector< std::vector< size_type >>  row_groups)
inline

Sets vector of individual row groups to read.

Parameters
row_groupsVector of row groups to read
Returns
this for chaining

Definition at line 449 of file parquet.hpp.

◆ set_column_schema()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::set_column_schema ( std::vector< reader_column_schema val)
inline

Sets reader metadata.

Parameters
valTree of metadata information.
Returns
this for chaining

Definition at line 522 of file parquet.hpp.

◆ skip_bytes()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::skip_bytes ( size_t  val)
inline

Sets bytes to skip before starting reading row groups.

Parameters
valBytes to skip before starting reading row groups
Returns
this for chaining

Definition at line 558 of file parquet.hpp.

◆ skip_rows()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::skip_rows ( int64_t  val)
inline

Sets number of rows to skip.

Parameters
valNumber of rows to skip from start
Returns
this for chaining

Definition at line 534 of file parquet.hpp.

◆ timestamp_type()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::timestamp_type ( data_type  type)
inline

timestamp_type used to cast timestamp columns.

Parameters
typeThe timestamp data_type to which all timestamp columns need to be cast
Returns
this for chaining

Definition at line 582 of file parquet.hpp.

◆ use_arrow_schema()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::use_arrow_schema ( bool  val)
inline

Sets to enable/disable use of arrow schema to read.

Parameters
valBoolean value whether to use arrow schema
Returns
this for chaining

Definition at line 495 of file parquet.hpp.

◆ use_jit_filter()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::use_jit_filter ( bool  use_jit_filter)
inline

Enable/disable use of JIT for filter step.

Parameters
use_jit_filterBoolean value whether to use JIT filter
Returns
this for chaining

Definition at line 594 of file parquet.hpp.

◆ use_pandas_metadata()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::use_pandas_metadata ( bool  val)
inline

Sets to enable/disable use of pandas metadata to read.

Parameters
valBoolean value whether to use pandas metadata
Returns
this for chaining

Definition at line 483 of file parquet.hpp.


The documentation for this class was generated from the following file: