Parquet#

class pylibcudf.io.parquet.ChunkedParquetReader(ParquetReaderOptions options, size_t chunk_read_limit=0, size_t pass_read_limit=1024000000)#

Reads chunks of a Parquet file into a TableWithMetadata.

For details, see chunked_parquet_reader.

Parameters:

optionsParquetReaderOptions: Settings for controlling reading behavior
chunk_read_limitsize_t, default 0: Limit on total number of bytes to be returned per read, or 0 if there is no limit.
pass_read_limitsize_t, default 1024000000: Limit on the amount of memory used for reading and decompressing data or 0 if there is no limit.

Methods

`has_next`(self)	Returns True if there is another chunk in the Parquet file to be read.
`read_chunk`(self)	Read the next chunk into a `TableWithMetadata`

has_next(self) → bool#

Returns True if there is another chunk in the Parquet file to be read.

Returns:

True if we have not finished reading the file.

read_chunk(self) → TableWithMetadata#

Read the next chunk into a TableWithMetadata

Returns:

TableWithMetadata: The Table and its corresponding metadata (column names) that were read in.

class pylibcudf.io.parquet.ParquetReaderOptions#

The settings to use for read_parquet For details, see cudf::io::parquet_reader_options

Methods

`builder`(SourceInfo source)	Create a ParquetReaderOptionsBuilder object
`set_columns`(self, list col_names)	Sets names of the columns to be read.
`set_filter`(self, Expression filter)	Sets AST based filter for predicate pushdown.
`set_num_rows`(self, size_type nrows)	Sets number of rows to read.
`set_row_groups`(self, list row_groups)	Sets list of individual row groups to read.
`set_skip_rows`(self, int64_t skip_rows)	Sets number of rows to skip.

static builder(SourceInfo source)#

Create a ParquetReaderOptionsBuilder object

For details, see cudf::io::parquet_reader_options::builder()

Parameters:

sinkSourceInfo: The source to read the Parquet file from.

Returns:

ParquetReaderOptionsBuilder: Builder to build ParquetReaderOptions

set_columns(self, list col_names) → void#

Sets names of the columns to be read.

Parameters:

col_nameslist: List of column names

Returns:

None

set_filter(self, Expression filter) → void#

Sets AST based filter for predicate pushdown.

Parameters:

filterExpression: AST expression to use as filter

Returns:

None

set_num_rows(self, size_type nrows) → void#

Sets number of rows to read.

Parameters:

nrowssize_type: Number of rows to read after skip

Returns:

None

set_row_groups(self, list row_groups) → void#

Sets list of individual row groups to read.

Parameters:

row_groupslist: List of row groups to read

Returns:

None

set_skip_rows(self, int64_t skip_rows) → void#

Sets number of rows to skip.

Parameters:

skip_rowsint64_t: Number of rows to skip from start

Returns:

None

pylibcudf.io.parquet.read_parquet(ParquetReaderOptions options)#

Read from Parquet format.

The source to read from and options are encapsulated by the options object.

For details, see read_parquet().

Parameters:

options: ParquetReaderOptions: Settings for controlling reading behavior

pylibcudf.io.parquet.write_parquet(ParquetWriterOptions options) → memoryview#

Writes a set of columns to parquet format.

Parameters:

optionsParquetWriterOptions: Settings for controlling writing behavior

Returns:

memoryview: A blob that contains the file metadata (parquet FileMetadata thrift message) if requested in parquet_writer_options (empty blob otherwise).

Parquet#

This Page