cudf.DataFrame.to_orc#

DataFrame.to_orc(fname, compression='snappy', statistics='ROWGROUP', stripe_size_bytes=None, stripe_size_rows=None, row_index_stride=None, cols_as_map_type=None, storage_options=None, index=None)#

Write a DataFrame to the ORC format.

Parameters:
fnamestr

File path or object where the ORC dataset will be stored.

compression{{ ‘snappy’, ‘ZSTD’, ‘ZLIB’, ‘LZ4’, None }}, default ‘snappy’

Name of the compression to use; case insensitive. Use None for no compression.

statistics: str {{ “ROWGROUP”, “STRIPE”, None }}, default “ROWGROUP”

The granularity with which column statistics must be written to the file.

stripe_size_bytes: integer or None, default None

Maximum size of each stripe of the output. If None, 67108864 (64MB) will be used.

stripe_size_rows: integer or None, default None

Maximum number of rows of each stripe of the output. If None, 1000000 will be used.

row_index_stride: integer or None, default None

Row index stride (maximum number of rows in each row group). If None, 10000 will be used.

cols_as_map_typelist of column names or None, default None

A list of column names which should be written as map type in the ORC file. Note that this option only affects columns of ListDtype. Names of other column types will be ignored.

storage_optionsdict, optional, default None

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details.

indexbool, default None

If True, include the dataframe’s index(es) in the file output. If False, they will not be written to the file. If None, similar to True the dataframe’s index(es) will be saved, however, instead of being saved as values any RangeIndex will be stored as a range in the metadata so it doesn’t require much space and is faster. Other indexes will be included as columns in the file output.

See also

cudf.read_orc