Parquet#
- class pylibcudf.io.parquet.ChunkedParquetReader(ParquetReaderOptions options, size_t chunk_read_limit=0, size_t pass_read_limit=1024000000)#
Reads chunks of a Parquet file into a
TableWithMetadata
.For details, see
chunked_parquet_reader
.- Parameters:
- optionsParquetReaderOptions
Settings for controlling reading behavior
- chunk_read_limitsize_t, default 0
Limit on total number of bytes to be returned per read, or 0 if there is no limit.
- pass_read_limitsize_t, default 1024000000
Limit on the amount of memory used for reading and decompressing data or 0 if there is no limit.
Methods
has_next
(self)Returns True if there is another chunk in the Parquet file to be read.
read_chunk
(self)Read the next chunk into a
TableWithMetadata
- has_next(self) bool #
Returns True if there is another chunk in the Parquet file to be read.
- Returns:
- True if we have not finished reading the file.
- read_chunk(self) TableWithMetadata #
Read the next chunk into a
TableWithMetadata
- Returns:
- TableWithMetadata
The Table and its corresponding metadata (column names) that were read in.
- class pylibcudf.io.parquet.ParquetReaderOptions#
The settings to use for
read_parquet
For details, seecudf::io::parquet_reader_options
Methods
builder
(SourceInfo source)Create a ParquetReaderOptionsBuilder object
set_columns
(self, list col_names)Sets names of the columns to be read.
set_filter
(self, Expression filter)Sets AST based filter for predicate pushdown.
set_num_rows
(self, size_type nrows)Sets number of rows to read.
set_row_groups
(self, list row_groups)Sets list of individual row groups to read.
set_skip_rows
(self, int64_t skip_rows)Sets number of rows to skip.
- static builder(SourceInfo source)#
Create a ParquetReaderOptionsBuilder object
For details, see
cudf::io::parquet_reader_options::builder()
- Parameters:
- sinkSourceInfo
The source to read the Parquet file from.
- Returns:
- ParquetReaderOptionsBuilder
Builder to build ParquetReaderOptions
- set_columns(self, list col_names) void #
Sets names of the columns to be read.
- Parameters:
- col_nameslist
List of column names
- Returns:
- None
- set_filter(self, Expression filter) void #
Sets AST based filter for predicate pushdown.
- Parameters:
- filterExpression
AST expression to use as filter
- Returns:
- None
- set_num_rows(self, size_type nrows) void #
Sets number of rows to read.
- Parameters:
- nrowssize_type
Number of rows to read after skip
- Returns:
- None
- set_row_groups(self, list row_groups) void #
Sets list of individual row groups to read.
- Parameters:
- row_groupslist
List of row groups to read
- Returns:
- None
- set_skip_rows(self, int64_t skip_rows) void #
Sets number of rows to skip.
- Parameters:
- skip_rowsint64_t
Number of rows to skip from start
- Returns:
- None
- pylibcudf.io.parquet.read_parquet(ParquetReaderOptions options)#
Read from Parquet format.
The source to read from and options are encapsulated by the options object.
For details, see
read_parquet()
.- Parameters:
- options: ParquetReaderOptions
Settings for controlling reading behavior
- pylibcudf.io.parquet.write_parquet(ParquetWriterOptions options) memoryview #
Writes a set of columns to parquet format.
- Parameters:
- optionsParquetWriterOptions
Settings for controlling writing behavior
- Returns:
- memoryview
A blob that contains the file metadata (parquet FileMetadata thrift message) if requested in parquet_writer_options (empty blob otherwise).