Public Member Functions | Protected Member Functions | List of all members
cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT > Class Template Reference

Base class for Parquet options builders. More...

#include <parquet.hpp>

Public Member Functions

 parquet_writer_options_builder_base ()=default
 Default constructor. More...
 
BuilderT & metadata (table_input_metadata metadata)
 Sets metadata. More...
 
BuilderT & key_value_metadata (std::vector< std::map< std::string, std::string >> metadata)
 Sets Key-Value footer metadata. More...
 
BuilderT & stats_level (statistics_freq sf)
 Sets the level of statistics. More...
 
BuilderT & compression (compression_type compression)
 Sets compression type. More...
 
BuilderT & row_group_size_bytes (size_t val)
 Sets the maximum row group size, in bytes. More...
 
BuilderT & row_group_size_rows (size_type val)
 Sets the maximum number of rows in output row groups. More...
 
BuilderT & max_page_size_bytes (size_t val)
 Sets the maximum uncompressed page size, in bytes. More...
 
BuilderT & max_page_size_rows (size_type val)
 Sets the maximum page size, in rows. Counts only top-level rows, ignoring any nesting. Cannot be larger than the row group size in rows, and will be adjusted to match if it is. More...
 
BuilderT & column_index_truncate_length (int32_t val)
 Sets the desired maximum size in bytes for min and max values in the column index. More...
 
BuilderT & dictionary_policy (enum dictionary_policy val)
 Sets the policy for dictionary use. More...
 
BuilderT & max_dictionary_size (size_t val)
 Sets the maximum dictionary size, in bytes. More...
 
BuilderT & max_page_fragment_size (size_type val)
 Sets the maximum page fragment size, in rows. More...
 
BuilderT & compression_statistics (std::shared_ptr< writer_compression_statistics > const &comp_stats)
 Sets the pointer to the output compression statistics. More...
 
BuilderT & int96_timestamps (bool enabled)
 Sets whether int96 timestamps are written or not. More...
 
BuilderT & utc_timestamps (bool enabled)
 Set to true if timestamps are to be written as UTC. More...
 
BuilderT & write_arrow_schema (bool enabled)
 Set to true if arrow schema is to be written. More...
 
BuilderT & write_v2_headers (bool enabled)
 Set to true if V2 page headers are to be written. More...
 
BuilderT & sorting_columns (std::vector< sorting_column > sorting_columns)
 Sets column sorting metadata. More...
 
 operator OptionsT && ()
 move options member once it's built.
 
OptionsT && build ()
 move options member once it's built. More...
 

Protected Member Functions

OptionsT & get_options ()
 Return reference to the options object being built. More...
 
 parquet_writer_options_builder_base (OptionsT options)
 Constructor from options. More...
 

Detailed Description

template<class BuilderT, class OptionsT>
class cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >

Base class for Parquet options builders.

Definition at line 965 of file parquet.hpp.

Constructor & Destructor Documentation

◆ parquet_writer_options_builder_base() [1/2]

template<class BuilderT , class OptionsT >
cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::parquet_writer_options_builder_base ( OptionsT  options)
explicitprotected

Constructor from options.

Parameters
optionsOptions object to build

◆ parquet_writer_options_builder_base() [2/2]

template<class BuilderT , class OptionsT >
cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::parquet_writer_options_builder_base ( )
explicitdefault

Default constructor.

This has been added since Cython requires a default constructor to create objects on stack.

Member Function Documentation

◆ build()

template<class BuilderT , class OptionsT >
OptionsT&& cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::build ( )

move options member once it's built.

This has been added since Cython does not support overloading of conversion operators.

Returns
Built parquet_writer_options object's r-value reference

◆ column_index_truncate_length()

template<class BuilderT , class OptionsT >
BuilderT& cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::column_index_truncate_length ( int32_t  val)

Sets the desired maximum size in bytes for min and max values in the column index.

Values exceeding this limit will be truncated, but modified such that they will still be valid lower and upper bounds. This only applies to variable length types, such as string. Maximum values will not be truncated if there is no suitable truncation that results in a valid upper bound.

Default value is 64.

Parameters
vallength min/max will be truncated to, with 0 indicating no truncation
Returns
this for chaining

◆ compression()

template<class BuilderT , class OptionsT >
BuilderT& cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::compression ( compression_type  compression)

Sets compression type.

Parameters
compressionThe compression type to use
Returns
this for chaining

◆ compression_statistics()

template<class BuilderT , class OptionsT >
BuilderT& cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::compression_statistics ( std::shared_ptr< writer_compression_statistics > const &  comp_stats)

Sets the pointer to the output compression statistics.

Parameters
comp_statsPointer to compression statistics to be filled once writer is done
Returns
this for chaining

◆ dictionary_policy()

template<class BuilderT , class OptionsT >
BuilderT& cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::dictionary_policy ( enum dictionary_policy  val)

Sets the policy for dictionary use.

Certain compression algorithms (e.g Zstandard) have limits on how large of a buffer can be compressed. In some circumstances, the dictionary can grow beyond this limit, which will prevent the column from being compressed. This setting controls how the writer should act in these circumstances. A setting of dictionary_policy::ADAPTIVE will disable dictionary encoding for columns where the dictionary exceeds the limit. A setting of dictionary_policy::NEVER will disable the use of dictionary encoding globally. A setting of dictionary_policy::ALWAYS will allow the use of dictionary encoding even if it will result in the disabling of compression for columns that would otherwise be compressed.

The default value is dictionary_policy::ADAPTIVE.

Parameters
valpolicy for dictionary use
Returns
this for chaining

◆ get_options()

template<class BuilderT , class OptionsT >
OptionsT& cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::get_options ( )
inlineprotected

Return reference to the options object being built.

Returns
the options object

Definition at line 974 of file parquet.hpp.

◆ int96_timestamps()

template<class BuilderT , class OptionsT >
BuilderT& cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::int96_timestamps ( bool  enabled)

Sets whether int96 timestamps are written or not.

Parameters
enabledBoolean value to enable/disable int96 timestamps
Returns
this for chaining

◆ key_value_metadata()

template<class BuilderT , class OptionsT >
BuilderT& cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::key_value_metadata ( std::vector< std::map< std::string, std::string >>  metadata)

Sets Key-Value footer metadata.

Parameters
metadataKey-Value footer metadata
Returns
this for chaining

◆ max_dictionary_size()

template<class BuilderT , class OptionsT >
BuilderT& cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::max_dictionary_size ( size_t  val)

Sets the maximum dictionary size, in bytes.

Disables dictionary encoding for any column chunk where the dictionary will exceed this limit. Only used when the dictionary_policy is set to 'ADAPTIVE'.

Default value is 1048576 (1MiB).

Parameters
valmaximum dictionary size
Returns
this for chaining

◆ max_page_fragment_size()

template<class BuilderT , class OptionsT >
BuilderT& cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::max_page_fragment_size ( size_type  val)

Sets the maximum page fragment size, in rows.

Files with nested schemas or very long strings may need a page fragment size smaller than the default value of 5000 to ensure a single fragment will not exceed the desired maximum page size in bytes.

Parameters
valmaximum page fragment size
Returns
this for chaining

◆ max_page_size_bytes()

template<class BuilderT , class OptionsT >
BuilderT& cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::max_page_size_bytes ( size_t  val)

Sets the maximum uncompressed page size, in bytes.

Serves as a hint to the writer, and can be exceeded under certain circumstances. Cannot be larger than the row group size in bytes, and will be adjusted to match if it is.

Parameters
valmaximum page size
Returns
this for chaining

◆ max_page_size_rows()

template<class BuilderT , class OptionsT >
BuilderT& cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::max_page_size_rows ( size_type  val)

Sets the maximum page size, in rows. Counts only top-level rows, ignoring any nesting. Cannot be larger than the row group size in rows, and will be adjusted to match if it is.

Parameters
valmaximum rows per page
Returns
this for chaining

◆ metadata()

template<class BuilderT , class OptionsT >
BuilderT& cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::metadata ( table_input_metadata  metadata)

Sets metadata.

Parameters
metadataAssociated metadata
Returns
this for chaining

◆ row_group_size_bytes()

template<class BuilderT , class OptionsT >
BuilderT& cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::row_group_size_bytes ( size_t  val)

Sets the maximum row group size, in bytes.

Parameters
valmaximum row group size
Returns
this for chaining

◆ row_group_size_rows()

template<class BuilderT , class OptionsT >
BuilderT& cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::row_group_size_rows ( size_type  val)

Sets the maximum number of rows in output row groups.

Parameters
valmaximum number or rows
Returns
this for chaining

◆ sorting_columns()

template<class BuilderT , class OptionsT >
BuilderT& cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::sorting_columns ( std::vector< sorting_column sorting_columns)

Sets column sorting metadata.

Parameters
sorting_columnsColumn sort order metadata
Returns
this for chaining

◆ stats_level()

template<class BuilderT , class OptionsT >
BuilderT& cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::stats_level ( statistics_freq  sf)

Sets the level of statistics.

Parameters
sfLevel of statistics requested in the output file
Returns
this for chaining

◆ utc_timestamps()

template<class BuilderT , class OptionsT >
BuilderT& cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::utc_timestamps ( bool  enabled)

Set to true if timestamps are to be written as UTC.

Parameters
enabledBoolean value to enable/disable writing of timestamps as UTC.
Returns
this for chaining

◆ write_arrow_schema()

template<class BuilderT , class OptionsT >
BuilderT& cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::write_arrow_schema ( bool  enabled)

Set to true if arrow schema is to be written.

Parameters
enabledBoolean value to enable/disable writing of arrow schema
Returns
this for chaining

◆ write_v2_headers()

template<class BuilderT , class OptionsT >
BuilderT& cudf::io::parquet_writer_options_builder_base< BuilderT, OptionsT >::write_v2_headers ( bool  enabled)

Set to true if V2 page headers are to be written.

Parameters
enabledBoolean value to enable/disable writing of V2 page headers.
Returns
this for chaining

The documentation for this class was generated from the following file: