Parquet#

class pylibcudf.io.parquet.ChunkedParquetReader(ParquetReaderOptions options, size_t chunk_read_limit=0, size_t pass_read_limit=1024000000)#

Reads chunks of a Parquet file into a TableWithMetadata.

For details, see chunked_parquet_reader.

Parameters:
optionsParquetReaderOptions

Settings for controlling reading behavior

chunk_read_limitsize_t, default 0

Limit on total number of bytes to be returned per read, or 0 if there is no limit.

pass_read_limitsize_t, default 1024000000

Limit on the amount of memory used for reading and decompressing data or 0 if there is no limit.

Methods

has_next(self)

Returns True if there is another chunk in the Parquet file to be read.

read_chunk(self)

Read the next chunk into a TableWithMetadata

has_next(self) bool#

Returns True if there is another chunk in the Parquet file to be read.

Returns:
True if we have not finished reading the file.
read_chunk(self) TableWithMetadata#

Read the next chunk into a TableWithMetadata

Returns:
TableWithMetadata

The Table and its corresponding metadata (column names) that were read in.

class pylibcudf.io.parquet.ParquetReaderOptions#

The settings to use for read_parquet For details, see cudf::io::parquet_reader_options

Methods

builder(SourceInfo source)

Create a ParquetReaderOptionsBuilder object

set_columns(self, list col_names)

Sets names of the columns to be read.

set_filter(self, Expression filter)

Sets AST based filter for predicate pushdown.

set_num_rows(self, size_type nrows)

Sets number of rows to read.

set_row_groups(self, list row_groups)

Sets list of individual row groups to read.

set_skip_rows(self, int64_t skip_rows)

Sets number of rows to skip.

static builder(SourceInfo source)#

Create a ParquetReaderOptionsBuilder object

For details, see cudf::io::parquet_reader_options::builder()

Parameters:
sinkSourceInfo

The source to read the Parquet file from.

Returns:
ParquetReaderOptionsBuilder

Builder to build ParquetReaderOptions

set_columns(self, list col_names) void#

Sets names of the columns to be read.

Parameters:
col_nameslist

List of column names

Returns:
None
set_filter(self, Expression filter) void#

Sets AST based filter for predicate pushdown.

Parameters:
filterExpression

AST expression to use as filter

Returns:
None
set_num_rows(self, size_type nrows) void#

Sets number of rows to read.

Parameters:
nrowssize_type

Number of rows to read after skip

Returns:
None
set_row_groups(self, list row_groups) void#

Sets list of individual row groups to read.

Parameters:
row_groupslist

List of row groups to read

Returns:
None
set_skip_rows(self, int64_t skip_rows) void#

Sets number of rows to skip.

Parameters:
skip_rowsint64_t

Number of rows to skip from start

Returns:
None
pylibcudf.io.parquet.read_parquet(ParquetReaderOptions options)#

Read from Parquet format.

The source to read from and options are encapsulated by the options object.

For details, see read_parquet().

Parameters:
options: ParquetReaderOptions

Settings for controlling reading behavior

pylibcudf.io.parquet.write_parquet(ParquetWriterOptions options) memoryview#

Writes a set of columns to parquet format.

Parameters:
optionsParquetWriterOptions

Settings for controlling writing behavior

Returns:
memoryview

A blob that contains the file metadata (parquet FileMetadata thrift message) if requested in parquet_writer_options (empty blob otherwise).