ORC#

class pylibcudf.io.orc.OrcColumnStatistics#

Attributes

has_null

number_of_values

Methods

get(self, item[, default])

get(self, item, default=None)#
class pylibcudf.io.orc.OrcReaderOptions#

The settings to use for read_orc

For details, see cudf::io::orc_reader_options

Methods

builder(SourceInfo source)

Create a OrcReaderOptionsBuilder object

set_columns(self, list col_names)

Sets names of the column to read.

set_decimal128_columns(self, list val)

Set columns that should be read as 128-bit Decimal.

set_num_rows(self, int64_t nrows)

Sets number of row to read.

set_skip_rows(self, int64_t skip_rows)

Sets number of rows to skip from the start.

set_source(self, SourceInfo src)

Set a new source info location.

set_stripes(self, list stripes)

Sets list of stripes to read for each input source.

set_timestamp_type(self, DataType type_)

Sets timestamp type to which timestamp column will be cast.

static builder(SourceInfo source)#

Create a OrcReaderOptionsBuilder object

For details, see cudf::io::orc_reader_options::builder()

Parameters:
sinkSourceInfo

The source to read the ORC file from.

Returns:
OrcReaderOptionsBuilder

Builder to build OrcReaderOptions

set_columns(self, list col_names) void#

Sets names of the column to read.

Parameters:
col_names: list[str]

List of column names

Returns:
None
set_decimal128_columns(self, list val) void#

Set columns that should be read as 128-bit Decimal.

Parameters:
val: list[str]

List of fully qualified column names

Returns:
None
set_num_rows(self, int64_t nrows) void#

Sets number of row to read.

Parameters:
nrows: int64_t

Number of rows

Returns:
None
set_skip_rows(self, int64_t skip_rows) void#

Sets number of rows to skip from the start.

Parameters:
skip_rows: int64_t

Number of rows

Returns:
None
set_source(self, SourceInfo src) void#

Set a new source info location.

Parameters:
srcSourceInfo

New source information, replacing existing information.

Returns:
None
set_stripes(self, list stripes) void#

Sets list of stripes to read for each input source.

Parameters:
stripes: list[list[size_type]]

List of lists, mapping stripes to read to input sources

Returns:
None
set_timestamp_type(self, DataType type_) void#

Sets timestamp type to which timestamp column will be cast.

Parameters:
type_: DataType

Type of timestamp

Returns:
None
class pylibcudf.io.orc.ParsedOrcStatistics#

Holds column names and parsed file-level and stripe-level statistics.

For details, see cudf::io::parsed_orc_statistics

Attributes

column_names

file_stats

stripes_stats

pylibcudf.io.orc.is_supported_read_orc(compression_type compression) bool#

Check if the compression type is supported for reading ORC files.

For details, see is_supported_read_orc().

Parameters:
compressionCompressionType

The compression type to check

Returns:
bool

True if the compression type is supported for reading ORC files

pylibcudf.io.orc.is_supported_write_orc(compression_type compression) bool#

Check if the compression type is supported for writing ORC files.

For details, see is_supported_write_orc().

Parameters:
compressionCompressionType

The compression type to check

Returns:
bool

True if the compression type is supported for writing ORC files

pylibcudf.io.orc.read_orc(OrcReaderOptions options, Stream stream=None, DeviceMemoryResource mr=None) TableWithMetadata#

Read from ORC format.

The source to read from and options are encapsulated by the options object.

For details, see read_orc().

Parameters:
options: OrcReaderOptions

Settings for controlling reading behavior

streamStream | None

CUDA stream used for device memory operations and kernel launches

mrDeviceMemoryResource, optional

Device memory resource used to allocate the returned table’s device memory.

pylibcudf.io.orc.read_parsed_orc_statistics(SourceInfo source_info, Stream stream=None) ParsedOrcStatistics#

Read ORC statistics from a source.

Parameters:
source_infoSourceInfo

The source to read statistics from.

streamStream | None

CUDA stream used for device memory operations and kernel launches.

Returns:
ParsedOrcStatistics

The parsed ORC statistics.

pylibcudf.io.orc.write_orc(OrcWriterOptions options, Stream stream=None) void#

Write to ORC format.

The table to write, output paths, and options are encapsulated by the options object.

For details, see write_orc().

Parameters:
options: OrcWriterOptions

Settings for controlling writing behavior

streamStream | None

CUDA stream used for device memory operations and kernel launches

Returns:
None