class, partition_cols, index=None, compression='snappy', statistics='ROWGROUP', max_file_size=None, file_name_prefix=None, storage_options=None)#

Write a parquet file or dataset incrementally


A local directory path or S3 URL. Will be used as root directory path while writing a partitioned dataset.


Column names by which to partition the dataset Columns are partitioned in the order they are given

indexbool, default None

If True, include the dataframe’s index(es) in the file output. If False, they will not be written to the file. If None, index(es) other than RangeIndex will be saved as columns.

compression{‘snappy’, None}, default ‘snappy’

Name of the compression to use. Use None for no compression.

statistics{‘ROWGROUP’, ‘PAGE’, ‘COLUMN’, ‘NONE’}, default ‘ROWGROUP’

Level at which column statistics should be included in file.

max_file_sizeint or str, default None

A file size that cannot be exceeded by the writer. It is in bytes, if the input is int. Size can also be a str in form or “10 MB”, “1 GB”, etc. If this parameter is used, it is mandatory to pass file_name_prefix.


This is a prefix to file names generated only when max_file_size is specified.

storage_optionsdict, optional, default None

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to Please see fsspec and urllib for more details.


Using a context

>>> df1 = cudf.DataFrame({"a": [1, 1, 2, 2, 1], "b": [9, 8, 7, 6, 5]})
>>> df2 = cudf.DataFrame({"a": [1, 3, 3, 1, 3], "b": [4, 3, 2, 1, 0]})
>>> with ParquetDatasetWriter("./dataset", partition_cols=["a"]) as cw:
...     cw.write_table(df1)
...     cw.write_table(df2)

By manually calling close()

>>> cw = ParquetDatasetWriter("./dataset", partition_cols=["a"])
>>> cw.write_table(df1)
>>> cw.write_table(df2)
>>> cw.close()

Both the methods will generate the same directory structure




Close all open files and optionally return footer metadata as a binary blob


Write a dataframe to the file/dataset