ORC#
- class pylibcudf.io.orc.OrcColumnStatistics#
Attributes
has_null
number_of_values
Methods
get(self, item[, default])- get(self, item, default=None)#
- class pylibcudf.io.orc.OrcReaderOptions#
The settings to use for
read_orcFor details, see
cudf::io::orc_reader_optionsMethods
builder(SourceInfo source)Create a OrcReaderOptionsBuilder object
set_columns(self, list col_names)Sets names of the column to read.
set_decimal128_columns(self, list val)Set columns that should be read as 128-bit Decimal.
set_num_rows(self, int64_t nrows)Sets number of row to read.
set_skip_rows(self, int64_t skip_rows)Sets number of rows to skip from the start.
set_source(self, SourceInfo src)Set a new source info location.
set_stripes(self, list stripes)Sets list of stripes to read for each input source.
set_timestamp_type(self, DataType type_)Sets timestamp type to which timestamp column will be cast.
- static builder(SourceInfo source)#
Create a OrcReaderOptionsBuilder object
For details, see
cudf::io::orc_reader_options::builder()- Parameters:
- sinkSourceInfo
The source to read the ORC file from.
- Returns:
- OrcReaderOptionsBuilder
Builder to build OrcReaderOptions
- set_columns(self, list col_names) void#
Sets names of the column to read.
- Parameters:
- col_names: list[str]
List of column names
- Returns:
- None
- set_decimal128_columns(self, list val) void#
Set columns that should be read as 128-bit Decimal.
- Parameters:
- val: list[str]
List of fully qualified column names
- Returns:
- None
- set_num_rows(self, int64_t nrows) void#
Sets number of row to read.
- Parameters:
- nrows: int64_t
Number of rows
- Returns:
- None
- set_skip_rows(self, int64_t skip_rows) void#
Sets number of rows to skip from the start.
- Parameters:
- skip_rows: int64_t
Number of rows
- Returns:
- None
- set_source(self, SourceInfo src) void#
Set a new source info location.
- Parameters:
- srcSourceInfo
New source information, replacing existing information.
- Returns:
- None
- set_stripes(self, list stripes) void#
Sets list of stripes to read for each input source.
- Parameters:
- stripes: list[list[size_type]]
List of lists, mapping stripes to read to input sources
- Returns:
- None
- set_timestamp_type(self, DataType type_) void#
Sets timestamp type to which timestamp column will be cast.
- Parameters:
- type_: DataType
Type of timestamp
- Returns:
- None
- class pylibcudf.io.orc.ParsedOrcStatistics#
Holds column names and parsed file-level and stripe-level statistics.
For details, see
cudf::io::parsed_orc_statisticsAttributes
column_names
file_stats
stripes_stats
- pylibcudf.io.orc.is_supported_read_orc(compression_type compression) bool#
Check if the compression type is supported for reading ORC files.
For details, see
is_supported_read_orc().- Parameters:
- compressionCompressionType
The compression type to check
- Returns:
- bool
True if the compression type is supported for reading ORC files
- pylibcudf.io.orc.is_supported_write_orc(compression_type compression) bool#
Check if the compression type is supported for writing ORC files.
For details, see
is_supported_write_orc().- Parameters:
- compressionCompressionType
The compression type to check
- Returns:
- bool
True if the compression type is supported for writing ORC files
- pylibcudf.io.orc.read_orc(OrcReaderOptions options, Stream stream=None, DeviceMemoryResource mr=None) TableWithMetadata#
Read from ORC format.
The source to read from and options are encapsulated by the options object.
For details, see
read_orc().- Parameters:
- options: OrcReaderOptions
Settings for controlling reading behavior
- streamStream | None
CUDA stream used for device memory operations and kernel launches
- mrDeviceMemoryResource, optional
Device memory resource used to allocate the returned table’s device memory.
- pylibcudf.io.orc.read_parsed_orc_statistics(SourceInfo source_info, Stream stream=None) ParsedOrcStatistics#
Read ORC statistics from a source.
- Parameters:
- source_infoSourceInfo
The source to read statistics from.
- streamStream | None
CUDA stream used for device memory operations and kernel launches.
- Returns:
- ParsedOrcStatistics
The parsed ORC statistics.
- pylibcudf.io.orc.write_orc(OrcWriterOptions options, Stream stream=None) void#
Write to ORC format.
The table to write, output paths, and options are encapsulated by the options object.
For details, see
write_orc().- Parameters:
- options: OrcWriterOptions
Settings for controlling writing behavior
- streamStream | None
CUDA stream used for device memory operations and kernel launches
- Returns:
- None