cudf.io.parquet.read_parquet_metadata#

cudf.io.parquet.read_parquet_metadata(filepath_or_buffer)#

Read a Parquet file’s metadata and schema

Parameters:
pathstring or path object

Path of file to be read

Returns:
Total number of rows
Number of row groups
List of column names
Number of columns
List of metadata of row groups

Examples

>>> import cudf
>>> num_rows, num_row_groups, names, num_columns, row_group_metadata = cudf.io.read_parquet_metadata(filename)
>>> df = [cudf.read_parquet(fname, row_group=i) for i in range(row_groups)]
>>> df = cudf.concat(df)
>>> df
  num1                datetime text
0  123 2018-11-13T12:00:00.000 5451
1  456 2018-11-14T12:35:01.000 5784
2  789 2018-11-15T18:02:59.000 6117