Class ParquetChunkedReader

java.lang.Object
ai.rapids.cudf.ParquetChunkedReader
All Implemented Interfaces:
AutoCloseable

public class ParquetChunkedReader extends Object implements AutoCloseable
Provide an interface for reading a Parquet file in an iterative manner.
  • Constructor Details

    • ParquetChunkedReader

      public ParquetChunkedReader(long chunkSizeByteLimit, File filePath)
      Construct the reader instance from a read limit and a file path.
      Parameters:
      chunkSizeByteLimit - Limit on total number of bytes to be returned per read, or 0 if there is no limit.
      filePath - Full path of the input Parquet file to read.
    • ParquetChunkedReader

      public ParquetChunkedReader(long chunkSizeByteLimit, ParquetOptions opts, File filePath)
      Construct the reader instance from a read limit, a ParquetOptions object, and a file path.
      Parameters:
      chunkSizeByteLimit - Limit on total number of bytes to be returned per read, or 0 if there is no limit.
      opts - The options for Parquet reading.
      filePath - Full path of the input Parquet file to read.
    • ParquetChunkedReader

      public ParquetChunkedReader(long chunkSizeByteLimit, long passReadLimit, ParquetOptions opts, File filePath)
      Construct the reader instance from a read limit, a ParquetOptions object, and a file path.
      Parameters:
      chunkSizeByteLimit - Limit on total number of bytes to be returned per read, or 0 if there is no limit.
      passReadLimit - Limit on the amount of memory used for reading and decompressing data or 0 if there is no limit
      opts - The options for Parquet reading.
      filePath - Full path of the input Parquet file to read.
    • ParquetChunkedReader

      public ParquetChunkedReader(long chunkSizeByteLimit, ParquetOptions opts, HostMemoryBuffer buffer, long offset, long len)
      Construct the reader instance from a read limit and a file already read in a memory buffer.
      Parameters:
      chunkSizeByteLimit - Limit on total number of bytes to be returned per read, or 0 if there is no limit.
      opts - The options for Parquet reading.
      buffer - Raw Parquet file content.
      offset - The starting offset into buffer.
      len - The number of bytes to parse the given buffer.
    • ParquetChunkedReader

      public ParquetChunkedReader(long chunkSizeByteLimit, long passReadLimit, ParquetOptions opts, HostMemoryBuffer buffer, long offset, long len)
      Construct the reader instance from a read limit and a file already read in a memory buffer.
      Parameters:
      chunkSizeByteLimit - Limit on total number of bytes to be returned per read, or 0 if there is no limit.
      passReadLimit - Limit on the amount of memory used for reading and decompressing data or 0 if there is no limit
      opts - The options for Parquet reading.
      buffer - Raw Parquet file content.
      offset - The starting offset into buffer.
      len - The number of bytes to parse the given buffer.
    • ParquetChunkedReader

      public ParquetChunkedReader(long chunkSizeByteLimit, long passReadLimit, ParquetOptions opts, HostMemoryBuffer... buffers)
      Construct the reader instance from a read limit and data in host memory buffers.
      Parameters:
      chunkSizeByteLimit - Limit on total number of bytes to be returned per read, or 0 if there is no limit.
      passReadLimit - Limit on the amount of memory used for reading and decompressing data or 0 if there is no limit
      opts - The options for Parquet reading.
      buffers - Array of buffers containing the file data. The buffers are logically concatenated to construct the file being read.
    • ParquetChunkedReader

      public ParquetChunkedReader(long chunkSizeByteLimit, ParquetOptions opts, DataSource ds)
      Construct a reader instance from a DataSource
      Parameters:
      chunkSizeByteLimit - Limit on total number of bytes to be returned per read, or 0 if there is no limit.
      opts - The options for Parquet reading.
      ds - the data source to read from
  • Method Details

    • hasNext

      public boolean hasNext()
      Check if the given file has anything left to read.
      Returns:
      A boolean value indicating if there is more data to read from file.
    • readChunk

      public Table readChunk()
      Read a chunk of rows in the given Parquet file such that the returning data has total size does not exceed the given read limit. If the given file has no data, or all data has been read before by previous calls to this function, a null Table will be returned.
      Returns:
      A table of new rows reading from the given file.
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable