Package ai.rapids.cudf
Class ORCChunkedReader
java.lang.Object
ai.rapids.cudf.ORCChunkedReader
- All Implemented Interfaces:
AutoCloseable
Provide an interface for reading an ORC file in an iterative manner.
-
Constructor Summary
ConstructorsConstructorDescriptionORCChunkedReader
(long chunkReadLimit, long passReadLimit, long outputRowSizingGranularity, ORCOptions opts, HostMemoryBuffer buffer, long offset, long len) Construct a chunked ORC reader instance, similar toORCChunkedReader(long, long, ORCOptions, HostMemoryBuffer, long, long)
, with an additional parameter to control the granularity of the output table.ORCChunkedReader
(long chunkReadLimit, long passReadLimit, ORCOptions opts, HostMemoryBuffer buffer, long offset, long len) Construct the reader instance from read limits, output row granularity, and a file already loaded in a memory buffer. -
Method Summary
-
Constructor Details
-
ORCChunkedReader
public ORCChunkedReader(long chunkReadLimit, long passReadLimit, ORCOptions opts, HostMemoryBuffer buffer, long offset, long len) Construct the reader instance from read limits, output row granularity, and a file already loaded in a memory buffer.- Parameters:
chunkReadLimit
- Limit on total number of bytes to be returned per read, or 0 if there is no limit.passReadLimit
- Limit on the amount of memory used by the chunked reader, or 0 if there is no limit.opts
- The options for ORC reading.buffer
- Raw ORC file content.offset
- The starting offset into buffer.len
- The number of bytes to parse the given buffer.
-
ORCChunkedReader
public ORCChunkedReader(long chunkReadLimit, long passReadLimit, long outputRowSizingGranularity, ORCOptions opts, HostMemoryBuffer buffer, long offset, long len) Construct a chunked ORC reader instance, similar toORCChunkedReader(long, long, ORCOptions, HostMemoryBuffer, long, long)
, with an additional parameter to control the granularity of the output table. When reading a chunk table, with respect to the given size limits, a subset of stripes may be loaded, decompressed and decoded into a large intermediate table. The reader will then subdivide that table into smaller tables for final output usingoutputRowSizingGranularity
as the subdivision step. If the chunked reader is constructed without this parameter, the default value of 10k rows will be used.- Parameters:
outputRowSizingGranularity
- The change step in number of rows in the output table.- See Also:
-
-
Method Details
-
hasNext
public boolean hasNext()Check if the given file has anything left to read.- Returns:
- A boolean value indicating if there is more data to read from file.
-
readChunk
Read a chunk of rows in the given ORC file such that the returning data has total size does not exceed the given read limit. If the given file has no data, or all data has been read before by previous calls to this function, a null Table will be returned.- Returns:
- A table of new rows reading from the given file.
-
close
public void close()- Specified by:
close
in interfaceAutoCloseable
-