Class ORCChunkedReader

java.lang.Object
ai.rapids.cudf.ORCChunkedReader
All Implemented Interfaces:
AutoCloseable

public class ORCChunkedReader extends Object implements AutoCloseable
Provide an interface for reading an ORC file in an iterative manner.
  • Constructor Details

    • ORCChunkedReader

      public ORCChunkedReader(long chunkReadLimit, long passReadLimit, ORCOptions opts, HostMemoryBuffer buffer, long offset, long len)
      Construct the reader instance from read limits, output row granularity, and a file already loaded in a memory buffer.
      Parameters:
      chunkReadLimit - Limit on total number of bytes to be returned per read, or 0 if there is no limit.
      passReadLimit - Limit on the amount of memory used by the chunked reader, or 0 if there is no limit.
      opts - The options for ORC reading.
      buffer - Raw ORC file content.
      offset - The starting offset into buffer.
      len - The number of bytes to parse the given buffer.
    • ORCChunkedReader

      public ORCChunkedReader(long chunkReadLimit, long passReadLimit, long outputRowSizingGranularity, ORCOptions opts, HostMemoryBuffer buffer, long offset, long len)
      Construct a chunked ORC reader instance, similar to ORCChunkedReader(long, long, ORCOptions, HostMemoryBuffer, long, long), with an additional parameter to control the granularity of the output table. When reading a chunk table, with respect to the given size limits, a subset of stripes may be loaded, decompressed and decoded into a large intermediate table. The reader will then subdivide that table into smaller tables for final output using outputRowSizingGranularity as the subdivision step. If the chunked reader is constructed without this parameter, the default value of 10k rows will be used.
      Parameters:
      outputRowSizingGranularity - The change step in number of rows in the output table.
      See Also:
  • Method Details

    • hasNext

      public boolean hasNext()
      Check if the given file has anything left to read.
      Returns:
      A boolean value indicating if there is more data to read from file.
    • readChunk

      public Table readChunk()
      Read a chunk of rows in the given ORC file such that the returning data has total size does not exceed the given read limit. If the given file has no data, or all data has been read before by previous calls to this function, a null Table will be returned.
      Returns:
      A table of new rows reading from the given file.
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable