Parquet#

class pylibcudf.io.parquet.ChunkedParquetReader(SourceInfo source_info, list columns=None, list row_groups=None, bool use_pandas_metadata=True, bool convert_strings_to_categories=False, int64_t skip_rows=0, size_type nrows=-1, size_t chunk_read_limit=0, size_t pass_read_limit=1024000000, bool allow_mismatched_pq_schemas=False)#

Reads chunks of a Parquet file into a TableWithMetadata.

For details, see chunked_parquet_reader.

Parameters:

source_infoSourceInfo: The SourceInfo object to read the Parquet file from.
columnslist, default None: The names of the columns to be read
row_groupslist[list[size_type]], default None: List of row groups to be read.
use_pandas_metadatabool, default True: If True, return metadata about the index column in the per-file user metadata of the TableWithMetadata
convert_strings_to_categoriesbool, default False: Whether to convert string columns to the category type
skip_rowsint64_t, default 0: The number of rows to skip from the start of the file.
nrowssize_type, default -1: The number of rows to read. By default, read the entire file.
chunk_read_limitsize_t, default 0: Limit on total number of bytes to be returned per read, or 0 if there is no limit.
pass_read_limitsize_t, default 1024000000: Limit on the amount of memory used for reading and decompressing data or 0 if there is no limit.
allow_mismatched_pq_schemasbool, default False: Whether to read (matching) columns specified in columns from the input files with otherwise mismatched schemas.

Methods

`has_next`(self)	Returns True if there is another chunk in the Parquet file to be read.
`read_chunk`(self)	Read the next chunk into a `TableWithMetadata`

has_next(self) → bool#

Returns True if there is another chunk in the Parquet file to be read.

Returns:

True if we have not finished reading the file.

read_chunk(self) → TableWithMetadata#

Read the next chunk into a TableWithMetadata

Returns:

TableWithMetadata: The Table and its corresponding metadata (column names) that were read in.

pylibcudf.io.parquet.read_parquet(SourceInfo source_info, list columns=None, list row_groups=None, Expression filters=None, bool convert_strings_to_categories=False, bool use_pandas_metadata=True, int64_t skip_rows=0, size_type nrows=-1, bool allow_mismatched_pq_schemas=False)#

Reads an Parquet file into a TableWithMetadata.

For details, see read_parquet().

Parameters:

source_infoSourceInfo: The SourceInfo object to read the Parquet file from.
columnslist, default None: The string names of the columns to be read.
row_groupslist[list[size_type]], default None: List of row groups to be read.
filtersExpression, default None: An AST pylibcudf.expressions.Expression to use for predicate pushdown.
convert_strings_to_categoriesbool, default False: Whether to convert string columns to the category type
use_pandas_metadatabool, default True: If True, return metadata about the index column in the per-file user metadata of the TableWithMetadata
skip_rowsint64_t, default 0: The number of rows to skip from the start of the file.
nrowssize_type, default -1: The number of rows to read. By default, read the entire file.
allow_mismatched_pq_schemasbool, default False: If True, enable reading (matching) columns specified in columns from the input files with otherwise mismatched schemas.

Returns:

TableWithMetadata: The Table and its corresponding metadata (column names) that were read in.