Parquet Metadata#
- class pylibcudf.io.parquet_metadata.ParquetColumnSchema#
Schema of a parquet column, including the nested columns.
- Parameters:
- parquet_column_schema
Methods
child
(self, int idx)Returns schema of the child with the given index.
children
(self)Returns schemas of all child columns.
name
(self)Returns parquet column name; can be empty.
num_children
(self)Returns the number of child columns.
- child(self, int idx) ParquetColumnSchema #
Returns schema of the child with the given index.
- Parameters:
- idxint
Child Index
- Returns:
- ParquetColumnSchema
Child schema
- children(self) list #
Returns schemas of all child columns.
- Returns:
- list[ParquetColumnSchema]
Child schemas.
- name(self) unicode #
Returns parquet column name; can be empty.
- Returns:
- str
Column name
- class pylibcudf.io.parquet_metadata.ParquetMetadata#
Information about content of a parquet file.
- Parameters:
- parquet_metadata
Methods
metadata
(self)Returns the key-value metadata in the file footer.
num_rowgroups
(self)Returns the number of rowgroups in the file.
num_rows
(self)Returns the number of rows of the root column.
rowgroup_metadata
(self)Returns the row group metadata in the file footer.
schema
(self)Returns the parquet schema.
- metadata(self) dict #
Returns the key-value metadata in the file footer.
- Returns:
- dict[str, str]
Key value metadata as a map.
- num_rowgroups(self) int #
Returns the number of rowgroups in the file.
- Returns:
- int
Number of row groups.
- rowgroup_metadata(self) list #
Returns the row group metadata in the file footer.
- Returns:
- list[dict[str, int]]
Vector of row group metadata as maps.
- schema(self) ParquetSchema #
Returns the parquet schema.
- Returns:
- ParquetSchema
Parquet schema
- class pylibcudf.io.parquet_metadata.ParquetSchema#
Schema of a parquet file.
- Parameters:
- parquet_schema
Methods
root
(self)Returns the schema of the struct column that contains all columns as fields.
- root(self) ParquetColumnSchema #
Returns the schema of the struct column that contains all columns as fields.
- Returns:
- ParquetColumnSchema
Root column schema
- pylibcudf.io.parquet_metadata.read_parquet_metadata(SourceInfo src_info) ParquetMetadata #
Reads metadata of parquet dataset.
- Parameters:
- src_infoSourceInfo
Dataset source.
- Returns:
- ParquetMetadata
Parquet_metadata with parquet schema, number of rows, number of row groups and key-value metadata.