IO interfaces. More...
Classes | |
| class | avro_reader_options |
Settings to use for read_avro(). More... | |
| class | avro_reader_options_builder |
Builder to build options for read_avro(). More... | |
| class | csv_reader_options |
Settings to use for read_csv(). More... | |
| class | csv_reader_options_builder |
Builder to build options for read_csv(). More... | |
| class | csv_writer_options |
Settings to use for write_csv(). More... | |
| class | csv_writer_options_builder |
Builder to build options for writer_csv() More... | |
| class | data_sink |
| Interface class for storing the output data from the writers. More... | |
| class | datasource |
| Interface class for providing input data to the readers. More... | |
| struct | schema_element |
Allows specifying the target types for nested JSON data via json_reader_options' set_dtypes method. More... | |
| class | json_reader_options |
Input arguments to the read_json interface. More... | |
| class | json_reader_options_builder |
Builds settings to use for read_json(). More... | |
| class | json_writer_options |
Settings to use for write_json(). More... | |
| class | json_writer_options_builder |
Builder to build options for writer_json() More... | |
| class | orc_reader_options |
Settings to use for read_orc(). More... | |
| class | orc_reader_options_builder |
Builds settings to use for read_orc(). More... | |
| class | chunked_orc_reader |
| The chunked orc reader class to read an ORC file iteratively into a series of tables, chunk by chunk. More... | |
| class | orc_writer_options |
Settings to use for write_orc(). More... | |
| class | orc_writer_options_builder |
Builds settings to use for write_orc(). More... | |
| class | chunked_orc_writer_options |
Settings to use for write_orc_chunked(). More... | |
| class | chunked_orc_writer_options_builder |
Builds settings to use for write_orc_chunked(). More... | |
| class | orc_chunked_writer |
| Chunked orc writer class writes an ORC file in a chunked/stream form. More... | |
| struct | raw_orc_statistics |
| Holds column names and buffers containing raw file-level and stripe-level statistics. More... | |
| struct | minmax_statistics |
| Base class for column statistics that include optional minimum and maximum. More... | |
| struct | sum_statistics |
| Base class for column statistics that include an optional sum. More... | |
| struct | integer_statistics |
| Statistics for integral columns. More... | |
| struct | double_statistics |
| Statistics for floating point columns. More... | |
| struct | string_statistics |
| Statistics for string columns. More... | |
| struct | bucket_statistics |
| Statistics for boolean columns. More... | |
| struct | decimal_statistics |
| Statistics for decimal columns. More... | |
| struct | timestamp_statistics |
| Statistics for timestamp columns. More... | |
| struct | column_statistics |
| Contains per-column ORC statistics. More... | |
| struct | parsed_orc_statistics |
| Holds column names and parsed file-level and stripe-level statistics. More... | |
| struct | orc_column_schema |
| Schema of an ORC column, including the nested columns. More... | |
| struct | orc_schema |
| Schema of an ORC file. More... | |
| class | orc_metadata |
| Information about content of an ORC file. More... | |
| class | parquet_reader_options |
Settings for read_parquet(). More... | |
| class | parquet_reader_options_builder |
Builds parquet_reader_options to use for read_parquet(). More... | |
| class | chunked_parquet_reader |
| The chunked parquet reader class to read Parquet file iteratively in to a series of tables, chunk by chunk. More... | |
| struct | sorting_column |
| Struct used to describe column sorting metadata. More... | |
| class | parquet_writer_options_base |
Base settings for write_parquet() and chunked_parquet_writer. More... | |
| class | parquet_writer_options_builder_base |
| Base class for Parquet options builders. More... | |
| class | parquet_writer_options |
Settings for write_parquet(). More... | |
| class | parquet_writer_options_builder |
Class to build parquet_writer_options. More... | |
| class | chunked_parquet_writer_options |
Settings for chunked_parquet_writer. More... | |
| class | chunked_parquet_writer_options_builder |
Class to build chunked_parquet_writer_options. More... | |
| class | chunked_parquet_writer |
| chunked parquet writer class to handle options and write tables in chunks. More... | |
| struct | parquet_column_schema |
| Schema of a parquet column, including the nested columns. More... | |
| struct | parquet_schema |
| Schema of a parquet file. More... | |
| class | parquet_metadata |
| Information about content of a parquet file. More... | |
| class | writer_compression_statistics |
| Statistics about compression performed by a writer. More... | |
| struct | column_name_info |
| Detailed name (and optionally nullability) information for output columns. More... | |
| struct | table_metadata |
| Table metadata returned by IO readers. More... | |
| struct | table_with_metadata |
| Table with table metadata used by io readers to return the metadata by value. More... | |
| struct | source_info |
| Source information for read interfaces. More... | |
| struct | sink_info |
| Destination information for write interfaces. More... | |
| class | column_in_metadata |
| Metadata for a column. More... | |
| class | table_input_metadata |
| Metadata for a table. More... | |
| struct | partition_info |
| Information used while writing partitioned datasets. More... | |
| class | reader_column_schema |
| schema element for reader More... | |
Typedefs | |
| using | no_statistics = std::monostate |
| Monostate type alias for the statistics variant. | |
| using | date_statistics = minmax_statistics< int32_t > |
| Statistics for date(time) columns. | |
| using | binary_statistics = sum_statistics< int64_t > |
| Statistics for binary columns. More... | |
| using | statistics_type = std::variant< no_statistics, integer_statistics, double_statistics, string_statistics, bucket_statistics, decimal_statistics, date_statistics, binary_statistics, timestamp_statistics > |
| Variant type for ORC type-specific column statistics. More... | |
| using | parquet_chunked_writer = chunked_parquet_writer |
Deprecated type alias for the chunked_parquet_writer More... | |
Enumerations | |
| enum class | json_recovery_mode_t { FAIL , RECOVER_WITH_NULL } |
| Control the error recovery behavior of the json parser. More... | |
| enum class | compression_type : int32_t { NONE , AUTO , SNAPPY , GZIP , BZIP2 , BROTLI , ZIP , XZ , ZLIB , LZ4 , LZO , ZSTD } |
| Compression algorithms. More... | |
| enum class | io_type : int32_t { FILEPATH , HOST_BUFFER , DEVICE_BUFFER , VOID , USER_IMPLEMENTED } |
| Data source or destination types. More... | |
| enum class | quote_style : int32_t { MINIMAL , ALL , NONNUMERIC , NONE } |
| Behavior when handling quotations in field data. More... | |
| enum | statistics_freq : int32_t { STATISTICS_NONE = 0 , STATISTICS_ROWGROUP = 1 , STATISTICS_PAGE = 2 , STATISTICS_COLUMN = 3 } |
| Column statistics granularity type for parquet/orc writers. More... | |
| enum class | column_encoding : int32_t { USE_DEFAULT = -1 , DICTIONARY , PLAIN , DELTA_BINARY_PACKED , DELTA_LENGTH_BYTE_ARRAY , DELTA_BYTE_ARRAY , BYTE_STREAM_SPLIT , DIRECT , DIRECT_V2 , DICTIONARY_V2 } |
Valid encodings for use with column_in_metadata::set_encoding() More... | |
| enum | dictionary_policy : int32_t { NEVER = 0 , ADAPTIVE = 1 , ALWAYS = 2 } |
| Control use of dictionary encoding for parquet writer. More... | |
Functions | |
| table_with_metadata | read_avro (avro_reader_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) |
| Reads an Avro dataset into a set of columns. More... | |
| table_with_metadata | read_csv (csv_reader_options options, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) |
| Reads a CSV dataset into a set of columns. More... | |
| void | write_csv (csv_writer_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream()) |
| Writes a set of columns to CSV format. More... | |
| constexpr bool | is_supported_write_csv (data_type type) |
| Checks if a cudf::data_type is supported for CSV writing. More... | |
| table_with_metadata | read_json (json_reader_options options, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) |
| Reads a JSON dataset into a set of columns. More... | |
| void | write_json (json_writer_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream()) |
| Writes a set of columns to JSON format. More... | |
| constexpr bool | is_supported_write_json (data_type type) |
| Checks if a cudf::data_type is supported for JSON writing. More... | |
| bool | is_supported_read_orc (compression_type compression) |
| Check if the compression type is supported for reading ORC files. More... | |
| bool | is_supported_write_orc (compression_type compression) |
| Check if the compression type is supported for writing ORC files. More... | |
| table_with_metadata | read_orc (orc_reader_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) |
| Reads an ORC dataset into a set of columns. More... | |
| void | write_orc (orc_writer_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream()) |
| Writes a set of columns to ORC format. More... | |
| raw_orc_statistics | read_raw_orc_statistics (source_info const &src_info, rmm::cuda_stream_view stream=cudf::get_default_stream()) |
| Reads file-level and stripe-level statistics of ORC dataset. More... | |
| parsed_orc_statistics | read_parsed_orc_statistics (source_info const &src_info, rmm::cuda_stream_view stream=cudf::get_default_stream()) |
| Reads file-level and stripe-level statistics of ORC dataset. More... | |
| orc_metadata | read_orc_metadata (source_info const &src_info, rmm::cuda_stream_view stream=cudf::get_default_stream()) |
| Reads metadata of ORC dataset. More... | |
| bool | is_supported_read_parquet (compression_type compression) |
| Check if the compression type is supported for reading Parquet files. More... | |
| bool | is_supported_write_parquet (compression_type compression) |
| Check if the compression type is supported for writing Parquet files. More... | |
| table_with_metadata | read_parquet (parquet_reader_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) |
| Reads a Parquet dataset into a set of columns. More... | |
| std::unique_ptr< std::vector< uint8_t > > | write_parquet (parquet_writer_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream()) |
| Writes a set of columns to parquet format. More... | |
| std::unique_ptr< std::vector< uint8_t > > | merge_row_group_metadata (std::vector< std::unique_ptr< std::vector< uint8_t >>> const &metadata_list) |
| Merges multiple raw metadata blobs that were previously created by write_parquet into a single metadata blob. More... | |
| parquet_metadata | read_parquet_metadata (source_info const &src_info) |
| Reads metadata of parquet dataset. More... | |
| template<typename T > | |
| constexpr auto | is_byte_like_type () |
Returns true if the type is byte-like, meaning it is reasonable to pass as a pointer to bytes. More... | |
Variables | |
| constexpr size_t | default_stripe_size_bytes = 64 * 1024 * 1024 |
| 64MB default orc stripe size | |
| constexpr size_type | default_stripe_size_rows = 1000000 |
| 1M rows default orc stripe rows | |
| constexpr size_type | default_row_index_stride = 10000 |
| 10K rows default orc row index stride | |
| constexpr size_t | default_row_group_size_bytes |
| Infinite bytes per row group. More... | |
| constexpr size_type | default_row_group_size_rows = 1'000'000 |
| 1 million rows per row group | |
| constexpr size_t | default_max_page_size_bytes = 512 * 1024 |
| 512KB per page | |
| constexpr size_type | default_max_page_size_rows = 20000 |
| 20k rows per page | |
| constexpr int32_t | default_column_index_truncate_length = 64 |
| truncate to 64 bytes | |
| constexpr size_t | default_max_dictionary_size = 1024 * 1024 |
| 1MB dictionary size | |
| constexpr size_type | default_max_page_fragment_size = 5000 |
| 5000 rows per page fragment | |
IO interfaces.