Package ai.rapids.cudf
Class ParquetChunkedReader
java.lang.Object
ai.rapids.cudf.ParquetChunkedReader
- All Implemented Interfaces:
AutoCloseable
Provide an interface for reading a Parquet file in an iterative manner.
-
Constructor Summary
ConstructorsConstructorDescriptionParquetChunkedReader
(long chunkSizeByteLimit, long passReadLimit, ParquetOptions opts, HostMemoryBuffer... buffers) Construct the reader instance from a read limit and data in host memory buffers.ParquetChunkedReader
(long chunkSizeByteLimit, long passReadLimit, ParquetOptions opts, HostMemoryBuffer buffer, long offset, long len) Construct the reader instance from a read limit and a file already read in a memory buffer.ParquetChunkedReader
(long chunkSizeByteLimit, long passReadLimit, ParquetOptions opts, File filePath) Construct the reader instance from a read limit, a ParquetOptions object, and a file path.ParquetChunkedReader
(long chunkSizeByteLimit, ParquetOptions opts, DataSource ds) Construct a reader instance from a DataSourceParquetChunkedReader
(long chunkSizeByteLimit, ParquetOptions opts, HostMemoryBuffer buffer, long offset, long len) Construct the reader instance from a read limit and a file already read in a memory buffer.ParquetChunkedReader
(long chunkSizeByteLimit, ParquetOptions opts, File filePath) Construct the reader instance from a read limit, a ParquetOptions object, and a file path.ParquetChunkedReader
(long chunkSizeByteLimit, File filePath) Construct the reader instance from a read limit and a file path. -
Method Summary
-
Constructor Details
-
ParquetChunkedReader
Construct the reader instance from a read limit and a file path.- Parameters:
chunkSizeByteLimit
- Limit on total number of bytes to be returned per read, or 0 if there is no limit.filePath
- Full path of the input Parquet file to read.
-
ParquetChunkedReader
Construct the reader instance from a read limit, a ParquetOptions object, and a file path.- Parameters:
chunkSizeByteLimit
- Limit on total number of bytes to be returned per read, or 0 if there is no limit.opts
- The options for Parquet reading.filePath
- Full path of the input Parquet file to read.
-
ParquetChunkedReader
public ParquetChunkedReader(long chunkSizeByteLimit, long passReadLimit, ParquetOptions opts, File filePath) Construct the reader instance from a read limit, a ParquetOptions object, and a file path.- Parameters:
chunkSizeByteLimit
- Limit on total number of bytes to be returned per read, or 0 if there is no limit.passReadLimit
- Limit on the amount of memory used for reading and decompressing data or 0 if there is no limitopts
- The options for Parquet reading.filePath
- Full path of the input Parquet file to read.
-
ParquetChunkedReader
public ParquetChunkedReader(long chunkSizeByteLimit, ParquetOptions opts, HostMemoryBuffer buffer, long offset, long len) Construct the reader instance from a read limit and a file already read in a memory buffer.- Parameters:
chunkSizeByteLimit
- Limit on total number of bytes to be returned per read, or 0 if there is no limit.opts
- The options for Parquet reading.buffer
- Raw Parquet file content.offset
- The starting offset into buffer.len
- The number of bytes to parse the given buffer.
-
ParquetChunkedReader
public ParquetChunkedReader(long chunkSizeByteLimit, long passReadLimit, ParquetOptions opts, HostMemoryBuffer buffer, long offset, long len) Construct the reader instance from a read limit and a file already read in a memory buffer.- Parameters:
chunkSizeByteLimit
- Limit on total number of bytes to be returned per read, or 0 if there is no limit.passReadLimit
- Limit on the amount of memory used for reading and decompressing data or 0 if there is no limitopts
- The options for Parquet reading.buffer
- Raw Parquet file content.offset
- The starting offset into buffer.len
- The number of bytes to parse the given buffer.
-
ParquetChunkedReader
public ParquetChunkedReader(long chunkSizeByteLimit, long passReadLimit, ParquetOptions opts, HostMemoryBuffer... buffers) Construct the reader instance from a read limit and data in host memory buffers.- Parameters:
chunkSizeByteLimit
- Limit on total number of bytes to be returned per read, or 0 if there is no limit.passReadLimit
- Limit on the amount of memory used for reading and decompressing data or 0 if there is no limitopts
- The options for Parquet reading.buffers
- Array of buffers containing the file data. The buffers are logically concatenated to construct the file being read.
-
ParquetChunkedReader
Construct a reader instance from a DataSource- Parameters:
chunkSizeByteLimit
- Limit on total number of bytes to be returned per read, or 0 if there is no limit.opts
- The options for Parquet reading.ds
- the data source to read from
-
-
Method Details
-
hasNext
public boolean hasNext()Check if the given file has anything left to read.- Returns:
- A boolean value indicating if there is more data to read from file.
-
readChunk
Read a chunk of rows in the given Parquet file such that the returning data has total size does not exceed the given read limit. If the given file has no data, or all data has been read before by previous calls to this function, a null Table will be returned.- Returns:
- A table of new rows reading from the given file.
-
close
public void close()- Specified by:
close
in interfaceAutoCloseable
-