Parquet#
- class pylibcudf.io.parquet.ChunkedParquetReader(SourceInfo source_info, list columns=None, list row_groups=None, bool use_pandas_metadata=True, bool convert_strings_to_categories=False, int64_t skip_rows=0, size_type nrows=-1, size_t chunk_read_limit=0, size_t pass_read_limit=1024000000, bool allow_mismatched_pq_schemas=False)#
Reads chunks of a Parquet file into a
TableWithMetadata
.For details, see
chunked_parquet_reader
.- Parameters:
- source_infoSourceInfo
The SourceInfo object to read the Parquet file from.
- columnslist, default None
The names of the columns to be read
- row_groupslist[list[size_type]], default None
List of row groups to be read.
- use_pandas_metadatabool, default True
If True, return metadata about the index column in the per-file user metadata of the
TableWithMetadata
- convert_strings_to_categoriesbool, default False
Whether to convert string columns to the category type
- skip_rowsint64_t, default 0
The number of rows to skip from the start of the file.
- nrowssize_type, default -1
The number of rows to read. By default, read the entire file.
- chunk_read_limitsize_t, default 0
Limit on total number of bytes to be returned per read, or 0 if there is no limit.
- pass_read_limitsize_t, default 1024000000
Limit on the amount of memory used for reading and decompressing data or 0 if there is no limit.
- allow_mismatched_pq_schemasbool, default False
Whether to read (matching) columns specified in columns from the input files with otherwise mismatched schemas.
Methods
has_next
(self)Returns True if there is another chunk in the Parquet file to be read.
read_chunk
(self)Read the next chunk into a
TableWithMetadata
- has_next(self) bool #
Returns True if there is another chunk in the Parquet file to be read.
- Returns:
- True if we have not finished reading the file.
- read_chunk(self) TableWithMetadata #
Read the next chunk into a
TableWithMetadata
- Returns:
- TableWithMetadata
The Table and its corresponding metadata (column names) that were read in.
- pylibcudf.io.parquet.read_parquet(SourceInfo source_info, list columns=None, list row_groups=None, Expression filters=None, bool convert_strings_to_categories=False, bool use_pandas_metadata=True, int64_t skip_rows=0, size_type nrows=-1, bool allow_mismatched_pq_schemas=False)#
Reads an Parquet file into a
TableWithMetadata
.For details, see
read_parquet()
.- Parameters:
- source_infoSourceInfo
The SourceInfo object to read the Parquet file from.
- columnslist, default None
The string names of the columns to be read.
- row_groupslist[list[size_type]], default None
List of row groups to be read.
- filtersExpression, default None
An AST
pylibcudf.expressions.Expression
to use for predicate pushdown.- convert_strings_to_categoriesbool, default False
Whether to convert string columns to the category type
- use_pandas_metadatabool, default True
If True, return metadata about the index column in the per-file user metadata of the
TableWithMetadata
- skip_rowsint64_t, default 0
The number of rows to skip from the start of the file.
- nrowssize_type, default -1
The number of rows to read. By default, read the entire file.
- allow_mismatched_pq_schemasbool, default False
If True, enable reading (matching) columns specified in columns from the input files with otherwise mismatched schemas.
- Returns:
- TableWithMetadata
The Table and its corresponding metadata (column names) that were read in.