cudf.io.parquet.read_parquet_metadata#
- cudf.io.parquet.read_parquet_metadata(filepath_or_buffer) tuple[int, int, list[Hashable], int, list[dict[str, int]]] [source]#
Read metadata and schema of a list of Parquet files
- Parameters:
- pathsList of strings or path objects
Path of file(s) to be read
- Returns:
- Total number of rows
- Total number of row groups
- List of column names
- Number of columns
- List of metadata of row groups
See also
Examples
>>> import cudf >>> num_rows, num_row_groups, names, num_columns, row_group_metadata = cudf.io.read_parquet_metadata(filename) >>> df = [cudf.read_parquet(fname, row_group=i) for i in range(num_row_groups)] >>> df = cudf.concat(df) >>> df num1 datetime text 0 123 2018-11-13T12:00:00.000 5451 1 456 2018-11-14T12:35:01.000 5784 2 789 2018-11-15T18:02:59.000 6117