Class DataSource

java.lang.Object
ai.rapids.cudf.DataSource
All Implemented Interfaces:
AutoCloseable
Direct Known Subclasses:
MultiBufferDataSource

public abstract class DataSource extends Object implements AutoCloseable
Base class that can be used to provide data dynamically to CUDF. This follows somewhat closely with cudf::io::datasource. There are a few main differences.
First this does not expose async device reads. It will call the non-async device read API instead. This might be added in the future, but there was no direct use case for it in java right now to warrant the added complexity.
Second there is no implementation of the device read API that returns a buffer instead of writing into one. This is not used by CUDF yet so testing an implementation that isn't used didn't feel ideal. If it is needed we will add one in the future.
  • Constructor Details

    • DataSource

      public DataSource()
  • Method Details

    • close

      public void close()
      Specified by:
      close in interface AutoCloseable
    • size

      public abstract long size()
      Get the size of the source in bytes.
    • hostRead

      public abstract HostMemoryBuffer hostRead(long offset, long amount) throws IOException
      Read data from the source at the given offset. Return a HostMemoryBuffer for the data that was read.
      Parameters:
      offset - where to start reading from.
      amount - the maximum number of bytes to read.
      Returns:
      a buffer that points to the data.
      Throws:
      IOException - on any error.
    • onHostBufferDone

      protected void onHostBufferDone(HostMemoryBuffer buffer)
      Called when the buffer returned from hostRead is done. The default is to close the buffer.
    • hostRead

      public abstract long hostRead(long offset, HostMemoryBuffer dest) throws IOException
      Read data from the source at the given offset into dest. Note that dest should not be closed, and no reference to it can outlive the call to hostRead. The target amount to read is dest's length.
      Parameters:
      offset - the offset to start reading from in the source.
      dest - where to write the data.
      Returns:
      the actual number of bytes written to dest.
      Throws:
      IOException
    • supportsDeviceRead

      public boolean supportsDeviceRead()
      Return true if this supports reading directly to the device else false. The default is no device support. This cannot change dynamically. It is typically read just once.
    • getDeviceReadCutoff

      public long getDeviceReadCutoff()
      Get the size cutoff between device reads and host reads when device reads are supported. Anything larger than the cutoff will be a device read and anything smaller will be a host read. By default, the cutoff is 0 so all reads will be device reads if device reads are supported.
    • deviceRead

      public long deviceRead(long offset, DeviceMemoryBuffer dest, Cuda.Stream stream) throws IOException
      Read data from the source at the given offset into dest. Note that dest should not be closed, and no reference to it can outlive the call to hostRead. The target amount to read is dest's length.
      Parameters:
      offset - the offset to start reading from
      dest - where to write the data.
      stream - the stream to do the copy on.
      Returns:
      the actual number of bytes written to dest.
      Throws:
      IOException