Parquet Metadata#

class pylibcudf.io.parquet_metadata.ParquetColumnSchema#

Schema of a parquet column, including the nested columns.

Parameters:
parquet_column_schema

Methods

child(self, int idx)

Returns schema of the child with the given index.

children(self)

Returns schemas of all child columns.

name(self)

Returns parquet column name; can be empty.

num_children(self)

Returns the number of child columns.

child(self, int idx) ParquetColumnSchema#

Returns schema of the child with the given index.

Parameters:
idxint

Child Index

Returns:
ParquetColumnSchema

Child schema

children(self) list#

Returns schemas of all child columns.

Returns:
list[ParquetColumnSchema]

Child schemas.

name(self) unicode#

Returns parquet column name; can be empty.

Returns:
str

Column name

num_children(self) int#

Returns the number of child columns.

Returns:
int

Children count

class pylibcudf.io.parquet_metadata.ParquetMetadata#

Information about content of a parquet file.

Parameters:
parquet_metadata

Methods

metadata(self)

Returns the key-value metadata in the file footer.

num_rowgroups(self)

Returns the number of rowgroups in the file.

num_rows(self)

Returns the number of rows of the root column.

rowgroup_metadata(self)

Returns the row group metadata in the file footer.

schema(self)

Returns the parquet schema.

metadata(self) dict#

Returns the key-value metadata in the file footer.

Returns:
dict[bytes, bytes]

Key value metadata as a map.

num_rowgroups(self) int#

Returns the number of rowgroups in the file.

Returns:
int

Number of row groups.

num_rows(self) int#

Returns the number of rows of the root column.

Returns:
int

Number of rows

rowgroup_metadata(self) list#

Returns the row group metadata in the file footer.

Returns:
list[dict[str, int]]

Vector of row group metadata as maps.

schema(self) ParquetSchema#

Returns the parquet schema.

Returns:
ParquetSchema

Parquet schema

class pylibcudf.io.parquet_metadata.ParquetSchema#

Schema of a parquet file.

Parameters:
parquet_schema

Methods

root(self)

Returns the schema of the struct column that contains all columns as fields.

root(self) ParquetColumnSchema#

Returns the schema of the struct column that contains all columns as fields.

Returns:
ParquetColumnSchema

Root column schema

pylibcudf.io.parquet_metadata.read_parquet_metadata(SourceInfo src_info) ParquetMetadata#

Reads metadata of parquet dataset.

Parameters:
src_infoSourceInfo

Dataset source.

Returns:
ParquetMetadata

Parquet_metadata with parquet schema, number of rows, number of row groups and key-value metadata.