I/O#

I/O Utility Classes#

pylibcudf.io.types.ColumnEncoding#

See also cudf::column_encoding.

Enum members

  • USE_DEFAULT

  • DICTIONARY

  • PLAIN

  • DELTA_BINARY_PACKED

  • DELTA_LENGTH_BYTE_ARRAY

  • DELTA_BYTE_ARRAY

  • BYTE_STREAM_SPLIT

  • DIRECT

  • DIRECT_V2

  • DICTIONARY_V2

class pylibcudf.io.types.ColumnInMetadata#

Metadata for a column

Methods

child(self, size_type i)

Get reference to a child of this column.

get_name(self)

Get the name of this column.

set_decimal_precision(self, uint8_t precision)

Set the decimal precision of this column.

set_encoding(self, column_encoding encoding)

Specifies whether this column should not be compressed regardless of the compression.

set_int96_timestamps(self, bool req)

Specifies whether this timestamp column should be encoded using the deprecated int96.

set_list_column_as_map(self)

Specify that this list column should be encoded as a map in the written file.

set_name(self, unicode name)

Set the name of this column.

set_nullability(self, bool nullable)

Set the nullability of this column.

set_output_as_binary(self, bool binary)

Specifies whether this column should be written as binary or string data.

set_skip_compression(self, bool skip)

Specifies whether this column should not be compressed regardless of the compression.

set_type_length(self, int32_t type_length)

Sets the length of fixed length data.

child(self, size_type i) ColumnInMetadata#

Get reference to a child of this column.

Parameters:
iint

Index of the child to get.

Returns:
ColumnInMetadata
get_name(self) unicode#

Get the name of this column.

Returns:
str

The name of this column

set_decimal_precision(self, uint8_t precision) ColumnInMetadata#

Set the decimal precision of this column. Only valid if this column is a decimal (fixed-point) type.

Parameters:
precisionint

The integer precision to set for this decimal column

Returns:
Self
set_encoding(self, column_encoding encoding) ColumnInMetadata#

Specifies whether this column should not be compressed regardless of the compression.

Parameters:
encodingColumnEncoding

The encoding to use

Returns:
ColumnInMetadata
set_int96_timestamps(self, bool req) ColumnInMetadata#

Specifies whether this timestamp column should be encoded using the deprecated int96.

Parameters:
reqbool

True = use int96 physical type. False = use int64 physical type.

Returns:
Self
set_list_column_as_map(self) ColumnInMetadata#

Specify that this list column should be encoded as a map in the written file.

Returns:
Self
set_name(self, unicode name) ColumnInMetadata#

Set the name of this column.

Parameters:
namestr

Name of the column

Returns:
Self
set_nullability(self, bool nullable) ColumnInMetadata#

Set the nullability of this column.

Parameters:
nullablebool

Whether this column is nullable

Returns:
Self
set_output_as_binary(self, bool binary) ColumnInMetadata#

Specifies whether this column should be written as binary or string data.

Parameters:
binarybool

True = use binary data type. False = use string data type

Returns:
Self
set_skip_compression(self, bool skip) ColumnInMetadata#

Specifies whether this column should not be compressed regardless of the compression.

Parameters:
skipbool

If true do not compress this column

Returns:
Self
set_type_length(self, int32_t type_length) ColumnInMetadata#

Sets the length of fixed length data.

Parameters:
type_lengthint

Size of the data type in bytes

Returns:
Self
pylibcudf.io.types.CompressionType#

See also cudf::compression_type.

Enum members

  • NONE

  • AUTO

  • SNAPPY

  • GZIP

  • BZIP2

  • BROTLI

  • ZIP

  • XZ

  • ZLIB

  • LZ4

  • LZO

  • ZSTD

pylibcudf.io.types.DictionaryPolicy#

See also cudf::dictionary_policy.

Enum members

  • NEVER

  • ADAPTIVE

  • ALWAYS

pylibcudf.io.types.JSONRecoveryMode#

See also cudf::json_recovery_mode_t.

Enum members

  • FAIL

  • RECOVER_WITH_NULL

class pylibcudf.io.types.PartitionInfo(size_type start_row, size_type num_rows)#

Information used while writing partitioned datasets.

Parameters:
start_rowint

The start row of the partition.

num_rowsint

The number of rows in the partition.

pylibcudf.io.types.QuoteStyle#

See also cudf::quote_style.

Enum members

  • MINIMAL

  • ALL

  • NONNUMERIC

  • NONE

class pylibcudf.io.types.SinkInfo(list sinks)#

A class containing details about destinations (sinks) to write data to.

For more details, see cudf::io::sink_info.

Parameters:
sinkslist of str, PathLike, or io.IOBase instances

A list of sinks to write data to. Each sink can be:

  • A string representing a filename.

  • A PathLike object.

  • An instance of a Python I/O class that is a subclass of io.IOBase (eg., io.BytesIO, io.StringIO).

The list must be homogeneous in type unless all sinks are instances of subclasses of io.IOBase. Mixing different types of sinks (that are not all io.IOBase instances) will raise a ValueError.

class pylibcudf.io.types.SourceInfo(list sources)#

A class containing details on a source to read from.

For details, see cudf::io::source_info.

Parameters:
sourcesList[Union[str, os.PathLike, bytes, io.BytesIO, DataSource]]

A homogeneous list of sources to read from.

Mixing different types of sources will raise a ValueError.

pylibcudf.io.types.StatisticsFreq#

See also cudf::statistics_freq.

Enum members

  • STATISTICS_NONE

  • STATISTICS_ROWGROUP

  • STATISTICS_PAGE

  • STATISTICS_COLUMN

class pylibcudf.io.types.TableInputMetadata(Table table)#

Metadata for a table

Parameters:
tableTable

The Table to construct metadata for

Attributes

column_metadata

class pylibcudf.io.types.TableWithMetadata(Table tbl, list column_names) A container holding a table and its associated metadata (e.g. column names)#

A container holding a table and its associated metadata (e.g. column names)

For details, see cudf::io::table_with_metadata.

Parameters:
tblTable

The input table.

column_nameslist

A list of tuples each containing the name of each column and the names of its child columns (in the same format). e.g. [(“id”, []), (“name”, [(“first”, []), (“last”, [])])]

Attributes

child_names

Return a dictionary mapping the names of columns with children to the names of their child columns

columns

Return a list containing the columns of the table

per_file_user_data

Returns a list containing a dict containing file-format specific metadata, for each file being read in.

tbl

tbl: pylibcudf.table.Table

Methods

column_names(self[, include_children])

Return a list containing the column names of the table

child_names#

Return a dictionary mapping the names of columns with children to the names of their child columns

column_names(self, include_children=False)#

Return a list containing the column names of the table

columns#

Return a list containing the columns of the table

per_file_user_data#

Returns a list containing a dict containing file-format specific metadata, for each file being read in.

tbl#

tbl: pylibcudf.table.Table

I/O Functions#