Files | |
file | avro.hpp |
file | csv.hpp |
file | io/json.hpp |
file | orc.hpp |
file | parquet.hpp |
file | byte_range_info.hpp |
file | data_chunk_source.hpp |
file | multibyte_split.hpp |
Classes | |
class | cudf::io::avro_reader_options |
Settings to use for read_avro() . More... | |
class | cudf::io::avro_reader_options_builder |
Builder to build options for read_avro() . More... | |
class | cudf::io::csv_reader_options |
Settings to use for read_csv() . More... | |
class | cudf::io::csv_reader_options_builder |
Builder to build options for read_csv() . More... | |
struct | cudf::io::schema_element |
Allows specifying the target types for nested JSON data via json_reader_options' set_dtypes method. More... | |
class | cudf::io::json_reader_options |
Input arguments to the read_json interface. More... | |
class | cudf::io::json_reader_options_builder |
Builds settings to use for read_json() . More... | |
class | cudf::io::orc_reader_options |
Settings to use for read_orc() . More... | |
class | cudf::io::orc_reader_options_builder |
Builds settings to use for read_orc() . More... | |
class | cudf::io::chunked_orc_reader |
The chunked orc reader class to read an ORC file iteratively into a series of tables, chunk by chunk. More... | |
class | cudf::io::parquet_reader_options |
Settings for read_parquet() . More... | |
class | cudf::io::parquet_reader_options_builder |
Builds parquet_reader_options to use for read_parquet() . More... | |
class | cudf::io::chunked_parquet_reader |
The chunked parquet reader class to read Parquet file iteratively in to a series of tables, chunk by chunk. More... | |
class | cudf::io::text::byte_range_info |
stores offset and size used to indicate a byte range More... | |
class | cudf::io::text::device_data_chunk |
A contract guaranteeing stream-ordered memory access to the underlying device data. More... | |
class | cudf::io::text::data_chunk_reader |
a reader capable of producing views over device memory. More... | |
class | cudf::io::text::data_chunk_source |
a data source capable of creating a reader which can produce views of the data source in device memory. More... | |
struct | cudf::io::text::parse_options |
Parsing options for multibyte_split. More... | |
Enumerations | |
enum class | cudf::io::json_recovery_mode_t { cudf::io::FAIL , cudf::io::RECOVER_WITH_NULL } |
Control the error recovery behavior of the json parser. More... | |
Variables | |
constexpr size_t | cudf::io::default_stripe_size_bytes = 64 * 1024 * 1024 |
64MB default orc stripe size | |
constexpr size_type | cudf::io::default_stripe_size_rows = 1000000 |
1M rows default orc stripe rows | |
constexpr size_type | cudf::io::default_row_index_stride = 10000 |
10K rows default orc row index stride | |
constexpr size_t | cudf::io::default_row_group_size_bytes |
Infinite bytes per row group. More... | |
constexpr size_type | cudf::io::default_row_group_size_rows = 1'000'000 |
1 million rows per row group | |
constexpr size_t | cudf::io::default_max_page_size_bytes = 512 * 1024 |
512KB per page | |
constexpr size_type | cudf::io::default_max_page_size_rows = 20000 |
20k rows per page | |
constexpr int32_t | cudf::io::default_column_index_truncate_length = 64 |
truncate to 64 bytes | |
constexpr size_t | cudf::io::default_max_dictionary_size = 1024 * 1024 |
1MB dictionary size | |
constexpr size_type | cudf::io::default_max_page_fragment_size = 5000 |
5000 rows per page fragment | |
|
strong |
Control the error recovery behavior of the json parser.
Enumerator | |
---|---|
FAIL | Does not recover from an error when encountering an invalid format. |
RECOVER_WITH_NULL | Recovers from an error, replacing invalid records with null. |
Definition at line 61 of file io/json.hpp.
byte_range_info cudf::io::text::create_byte_range_info_max | ( | ) |
Create a byte_range_info which represents as much of a file as possible. Specifically, [0, numeric_limits<int64_t>:\:max())
.
[0, numeric_limits<int64_t>:\:max())
std::vector<byte_range_info> cudf::io::text::create_byte_range_infos_consecutive | ( | int64_t | total_bytes, |
int64_t | range_count | ||
) |
Create a collection of consecutive ranges between [0, total_bytes).
Each range wil be the same size except if total_bytes
is not evenly divisible by range_count
, in which case the last range size will be the remainder.
total_bytes | total number of bytes in all ranges |
range_count | total number of ranges in which to divide bytes |
std::unique_ptr<cudf::column> cudf::io::text::multibyte_split | ( | data_chunk_source const & | source, |
std::string const & | delimiter, | ||
parse_options | options = {} , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Splits the source text into a strings column using a multiple byte delimiter.
Providing a byte range allows multibyte_split to read a file partially, only returning the offsets of delimiters which begin within the range. If thinking in terms of "records", where each delimiter dictates the end of a record, all records which begin within the byte range provided will be returned, including any record which may begin in the range but end outside of the range. Records which begin outside of the range will ignored, even if those records end inside the range.
source | The source string |
delimiter | UTF-8 encoded string for which to find offsets in the source |
options | the parsing options to use (including byte range) |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Memory resource to use for the device memory allocation |
table_with_metadata cudf::io::read_avro | ( | avro_reader_options const & | options, |
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Reads an Avro dataset into a set of columns.
The following code snippet demonstrates how to read a dataset from a file:
options | Settings for controlling reading behavior |
mr | Device memory resource used to allocate device memory of the table in the returned table_with_metadata |
table_with_metadata cudf::io::read_csv | ( | csv_reader_options | options, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Reads a CSV dataset into a set of columns.
The following code snippet demonstrates how to read a dataset from a file:
options | Settings for controlling reading behavior |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate device memory of the table in the returned table_with_metadata |
table_with_metadata cudf::io::read_json | ( | json_reader_options | options, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Reads a JSON dataset into a set of columns.
The following code snippet demonstrates how to read a dataset from a file:
options | Settings for controlling reading behavior |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate device memory of the table in the returned table_with_metadata. |
table_with_metadata cudf::io::read_orc | ( | orc_reader_options const & | options, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Reads an ORC dataset into a set of columns.
The following code snippet demonstrates how to read a dataset from a file:
options | Settings for controlling reading behavior |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate device memory of the table in the returned table_with_metadata. |
orc_metadata cudf::io::read_orc_metadata | ( | source_info const & | src_info, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
) |
Reads metadata of ORC dataset.
src_info | Dataset source |
stream | CUDA stream used for device memory operations and kernel launches |
table_with_metadata cudf::io::read_parquet | ( | parquet_reader_options const & | options, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Reads a Parquet dataset into a set of columns.
The following code snippet demonstrates how to read a dataset from a file:
options | Settings for controlling reading behavior |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate device memory of the table in the returned table_with_metadata |
parquet_metadata cudf::io::read_parquet_metadata | ( | source_info const & | src_info | ) |
Reads metadata of parquet dataset.
src_info | Dataset source |
parsed_orc_statistics cudf::io::read_parsed_orc_statistics | ( | source_info const & | src_info, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
) |
Reads file-level and stripe-level statistics of ORC dataset.
src_info | Dataset source |
stream | CUDA stream used for device memory operations and kernel launches |
raw_orc_statistics cudf::io::read_raw_orc_statistics | ( | source_info const & | src_info, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
) |
Reads file-level and stripe-level statistics of ORC dataset.
The following code snippet demonstrates how to read statistics of a dataset from a file:
src_info | Dataset source |
stream | CUDA stream used for device memory operations and kernel launches |
|
constexpr |
Infinite bytes per row group.
Definition at line 42 of file parquet.hpp.