I/O#
I/O Utility Classes#
- pylibcudf.io.types.ColumnEncoding#
See also
cudf::column_encoding
.Enum members
USE_DEFAULT
DICTIONARY
PLAIN
DELTA_BINARY_PACKED
DELTA_LENGTH_BYTE_ARRAY
DELTA_BYTE_ARRAY
BYTE_STREAM_SPLIT
DIRECT
DIRECT_V2
DICTIONARY_V2
- class pylibcudf.io.types.ColumnInMetadata#
Metadata for a column
Methods
child
(self, size_type i)Get reference to a child of this column.
get_name
(self)Get the name of this column.
set_decimal_precision
(self, uint8_t precision)Set the decimal precision of this column.
set_encoding
(self, column_encoding encoding)Specifies whether this column should not be compressed regardless of the compression.
set_int96_timestamps
(self, bool req)Specifies whether this timestamp column should be encoded using the deprecated int96.
set_list_column_as_map
(self)Specify that this list column should be encoded as a map in the written file.
set_name
(self, unicode name)Set the name of this column.
set_nullability
(self, bool nullable)Set the nullability of this column.
set_output_as_binary
(self, bool binary)Specifies whether this column should be written as binary or string data.
set_skip_compression
(self, bool skip)Specifies whether this column should not be compressed regardless of the compression.
set_type_length
(self, int32_t type_length)Sets the length of fixed length data.
- child(self, size_type i) ColumnInMetadata #
Get reference to a child of this column.
- Parameters:
- iint
Index of the child to get.
- Returns:
- ColumnInMetadata
- get_name(self) unicode #
Get the name of this column.
- Returns:
- str
The name of this column
- set_decimal_precision(self, uint8_t precision) ColumnInMetadata #
Set the decimal precision of this column. Only valid if this column is a decimal (fixed-point) type.
- Parameters:
- precisionint
The integer precision to set for this decimal column
- Returns:
- Self
- set_encoding(self, column_encoding encoding) ColumnInMetadata #
Specifies whether this column should not be compressed regardless of the compression.
- Parameters:
- encodingColumnEncoding
The encoding to use
- Returns:
- ColumnInMetadata
- set_int96_timestamps(self, bool req) ColumnInMetadata #
Specifies whether this timestamp column should be encoded using the deprecated int96.
- Parameters:
- reqbool
True = use int96 physical type. False = use int64 physical type.
- Returns:
- Self
- set_list_column_as_map(self) ColumnInMetadata #
Specify that this list column should be encoded as a map in the written file.
- Returns:
- Self
- set_name(self, unicode name) ColumnInMetadata #
Set the name of this column.
- Parameters:
- namestr
Name of the column
- Returns:
- Self
- set_nullability(self, bool nullable) ColumnInMetadata #
Set the nullability of this column.
- Parameters:
- nullablebool
Whether this column is nullable
- Returns:
- Self
- set_output_as_binary(self, bool binary) ColumnInMetadata #
Specifies whether this column should be written as binary or string data.
- Parameters:
- binarybool
True = use binary data type. False = use string data type
- Returns:
- Self
- set_skip_compression(self, bool skip) ColumnInMetadata #
Specifies whether this column should not be compressed regardless of the compression.
- Parameters:
- skipbool
If true do not compress this column
- Returns:
- Self
- set_type_length(self, int32_t type_length) ColumnInMetadata #
Sets the length of fixed length data.
- Parameters:
- type_lengthint
Size of the data type in bytes
- Returns:
- Self
- pylibcudf.io.types.CompressionType#
See also
cudf::compression_type
.Enum members
NONE
AUTO
SNAPPY
GZIP
BZIP2
BROTLI
ZIP
XZ
ZLIB
LZ4
LZO
ZSTD
- pylibcudf.io.types.DictionaryPolicy#
See also
cudf::dictionary_policy
.Enum members
NEVER
ADAPTIVE
ALWAYS
- pylibcudf.io.types.JSONRecoveryMode#
See also
cudf::json_recovery_mode_t
.Enum members
FAIL
RECOVER_WITH_NULL
- class pylibcudf.io.types.PartitionInfo(size_type start_row, size_type num_rows)#
Information used while writing partitioned datasets.
- Parameters:
- start_rowint
The start row of the partition.
- num_rowsint
The number of rows in the partition.
- pylibcudf.io.types.QuoteStyle#
See also
cudf::quote_style
.Enum members
MINIMAL
ALL
NONNUMERIC
NONE
- class pylibcudf.io.types.SinkInfo(list sinks)#
A class containing details about destinations (sinks) to write data to.
For more details, see
cudf::io::sink_info
.- Parameters:
- sinkslist of str, PathLike, or io.IOBase instances
A list of sinks to write data to. Each sink can be:
A string representing a filename.
A PathLike object.
An instance of a Python I/O class that is a subclass of io.IOBase (eg., io.BytesIO, io.StringIO).
The list must be homogeneous in type unless all sinks are instances of subclasses of io.IOBase. Mixing different types of sinks (that are not all io.IOBase instances) will raise a ValueError.
- class pylibcudf.io.types.SourceInfo(list sources)#
A class containing details on a source to read from.
For details, see
cudf::io::source_info
.- Parameters:
- sourcesList[Union[str, os.PathLike, bytes, io.BytesIO, DataSource]]
A homogeneous list of sources to read from.
Mixing different types of sources will raise a ValueError.
- pylibcudf.io.types.StatisticsFreq#
See also
cudf::statistics_freq
.Enum members
STATISTICS_NONE
STATISTICS_ROWGROUP
STATISTICS_PAGE
STATISTICS_COLUMN
- class pylibcudf.io.types.TableInputMetadata(Table table)#
Metadata for a table
- Parameters:
- tableTable
The Table to construct metadata for
Attributes
column_metadata
- class pylibcudf.io.types.TableWithMetadata(Table tbl, list column_names) A container holding a table and its associated metadata (e.g. column names)#
A container holding a table and its associated metadata (e.g. column names)
For details, see
cudf::io::table_with_metadata
.- Parameters:
- tblTable
The input table.
- column_nameslist
A list of tuples each containing the name of each column and the names of its child columns (in the same format). e.g. [(“id”, []), (“name”, [(“first”, []), (“last”, [])])]
Attributes
Return a dictionary mapping the names of columns with children to the names of their child columns
Return a list containing the columns of the table
Returns a list containing a dict containing file-format specific metadata, for each file being read in.
tbl: pylibcudf.table.Table
Methods
column_names
(self[, include_children])Return a list containing the column names of the table
- child_names#
Return a dictionary mapping the names of columns with children to the names of their child columns
- column_names(self, include_children=False)#
Return a list containing the column names of the table
- columns#
Return a list containing the columns of the table
- per_file_user_data#
Returns a list containing a dict containing file-format specific metadata, for each file being read in.
- tbl#
tbl: pylibcudf.table.Table