Class JCudfSerialization
The goal is to transfer data from a local GPU to a remote GPU as quickly and efficiently as possible using build in java communication channels. There is no guarantee of compatibility between different releases of CUDF. This is to allow us to adapt if internal memory layouts and formats change.
This version optimizes for reduced memory transfers, and as such will try to do the fewest number of transfers possible when putting the data back onto the GPU. This means that it will slice a single large memory buffer into smaller buffers used by the resulting ColumnVectors. The downside of this is that generally none of the memory can be released until all of the ColumnVectors are closed. It is assumed that this will not be a problem because for processing efficiency after the data is transferred it will likely be combined with other similar batches from other processes into a single larger buffer.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final class
Class to hold the header and buffer pair result from host-side concatenationstatic final class
Holds the metadata about a serialized column.static final class
Holds the metadata about a serialized table.static final class
Holds the result of deserializing a table. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic ContiguousTable
concatToContiguousTable
(JCudfSerialization.SerializedTableHeader[] headers, HostMemoryBuffer[] dataBuffers) Concatenate multiple tables in host memory into a contiguous table in device memory.concatToHostBuffer
(JCudfSerialization.SerializedTableHeader[] headers, HostMemoryBuffer[] dataBuffers) concatToHostBuffer
(JCudfSerialization.SerializedTableHeader[] headers, HostMemoryBuffer[] dataBuffers, HostMemoryAllocator hostMemoryAllocator) Concatenate multiple tables in host memory into a single host table buffer.static long
getSerializedSizeInBytes
(HostColumnVector[] columns, long rowOffset, long numRows) Get the size in bytes needed to serialize the given data.static Table
readAndConcat
(JCudfSerialization.SerializedTableHeader[] headers, HostMemoryBuffer[] dataBuffers) readTableFrom
(JCudfSerialization.SerializedTableHeader header, HostMemoryBuffer hostBuffer) readTableFrom
(InputStream in, HostMemoryAllocator hostMemoryAllocator) Read a serialize table from the given InputStream.static void
readTableIntoBuffer
(InputStream in, JCudfSerialization.SerializedTableHeader header, HostMemoryBuffer buffer) After reading a header for a table read the data portion into a host side buffer.static HostColumnVector[]
unpackHostColumnVectors
(JCudfSerialization.SerializedTableHeader header, HostMemoryBuffer hostBuffer) Deserialize a serialized contiguous table into an array of host columns.static void
writeConcatedStream
(JCudfSerialization.SerializedTableHeader[] headers, HostMemoryBuffer[] dataBuffers, OutputStream out) Take the data from multiple batches stored in the parsed headers and the dataBuffer and write it out to out as if it were a single buffer.static void
writeRowsToStream
(OutputStream out, long numRows) Write a rowcount only header to the output stream in a case where a columnar batch with no columns but a non zero row count is receivedstatic void
writeToStream
(ColumnVector[] columns, OutputStream out, long rowOffset, long numRows) Write all or part of a set of columns out in an internal format.static void
writeToStream
(HostColumnVector[] columns, OutputStream out, long rowOffset, long numRows) Write all or part of a set of columns out in an internal format.static void
writeToStream
(Table t, OutputStream out, long rowOffset, long numRows) Write all or part of a table out in an internal format.
-
Constructor Details
-
JCudfSerialization
public JCudfSerialization()
-
-
Method Details
-
getSerializedSizeInBytes
public static long getSerializedSizeInBytes(HostColumnVector[] columns, long rowOffset, long numRows) Get the size in bytes needed to serialize the given data. The columns should be in host memory before calling this.- Parameters:
columns
- columns to be serialized.rowOffset
- the first row to serialize.numRows
- the number of rows to serialize.- Returns:
- the size in bytes needed to serialize the data including the header.
-
writeToStream
public static void writeToStream(Table t, OutputStream out, long rowOffset, long numRows) throws IOException Write all or part of a table out in an internal format.- Parameters:
t
- the table to be written.out
- the stream to write the serialized table out to.rowOffset
- the first row to write out.numRows
- the number of rows to write out.- Throws:
IOException
-
writeToStream
public static void writeToStream(ColumnVector[] columns, OutputStream out, long rowOffset, long numRows) throws IOException Write all or part of a set of columns out in an internal format.- Parameters:
columns
- the columns to be written.out
- the stream to write the serialized table out to.rowOffset
- the first row to write out.numRows
- the number of rows to write out.- Throws:
IOException
-
writeToStream
public static void writeToStream(HostColumnVector[] columns, OutputStream out, long rowOffset, long numRows) throws IOException Write all or part of a set of columns out in an internal format.- Parameters:
columns
- the columns to be written.out
- the stream to write the serialized table out to.rowOffset
- the first row to write out.numRows
- the number of rows to write out.- Throws:
IOException
-
writeRowsToStream
Write a rowcount only header to the output stream in a case where a columnar batch with no columns but a non zero row count is received- Parameters:
out
- the stream to write the serialized table out to.numRows
- the number of rows to write out.- Throws:
IOException
-
writeConcatedStream
public static void writeConcatedStream(JCudfSerialization.SerializedTableHeader[] headers, HostMemoryBuffer[] dataBuffers, OutputStream out) throws IOException Take the data from multiple batches stored in the parsed headers and the dataBuffer and write it out to out as if it were a single buffer.- Parameters:
headers
- the headers parsed from multiple streams.dataBuffers
- an array of buffers that hold the data, one per header.out
- what to write the data out to.- Throws:
IOException
- on any error.
-
readAndConcat
public static Table readAndConcat(JCudfSerialization.SerializedTableHeader[] headers, HostMemoryBuffer[] dataBuffers) throws IOException - Throws:
IOException
-
concatToContiguousTable
public static ContiguousTable concatToContiguousTable(JCudfSerialization.SerializedTableHeader[] headers, HostMemoryBuffer[] dataBuffers) throws IOException Concatenate multiple tables in host memory into a contiguous table in device memory.- Parameters:
headers
- table headers corresponding to the host table buffersdataBuffers
- host table buffer for each input table to be concatenated- Returns:
- contiguous table in device memory
- Throws:
IOException
-
concatToHostBuffer
public static JCudfSerialization.HostConcatResult concatToHostBuffer(JCudfSerialization.SerializedTableHeader[] headers, HostMemoryBuffer[] dataBuffers, HostMemoryAllocator hostMemoryAllocator) throws IOException Concatenate multiple tables in host memory into a single host table buffer.- Parameters:
headers
- table headers corresponding to the host table buffersdataBuffers
- host table buffer for each input table to be concatenatedhostMemoryAllocator
- allocator for host memory buffers- Returns:
- host table header and buffer
- Throws:
IOException
-
concatToHostBuffer
public static JCudfSerialization.HostConcatResult concatToHostBuffer(JCudfSerialization.SerializedTableHeader[] headers, HostMemoryBuffer[] dataBuffers) throws IOException - Throws:
IOException
-
unpackHostColumnVectors
public static HostColumnVector[] unpackHostColumnVectors(JCudfSerialization.SerializedTableHeader header, HostMemoryBuffer hostBuffer) Deserialize a serialized contiguous table into an array of host columns.- Parameters:
header
- serialized table headerhostBuffer
- buffer containing the data for all columns in the serialized table- Returns:
- array of host columns representing the data from the serialized table
-
readTableIntoBuffer
public static void readTableIntoBuffer(InputStream in, JCudfSerialization.SerializedTableHeader header, HostMemoryBuffer buffer) throws IOException After reading a header for a table read the data portion into a host side buffer.- Parameters:
in
- the stream to read the data from.header
- the header that finished just moments ago.buffer
- the buffer to write the data into. If there is not enough room to store the data in buffer it will not be read and header will still have dataRead set to false.- Throws:
IOException
-
readTableFrom
public static JCudfSerialization.TableAndRowCountPair readTableFrom(JCudfSerialization.SerializedTableHeader header, HostMemoryBuffer hostBuffer) -
readTableFrom
public static JCudfSerialization.TableAndRowCountPair readTableFrom(InputStream in, HostMemoryAllocator hostMemoryAllocator) throws IOException Read a serialize table from the given InputStream.- Parameters:
in
- the stream to read the table data from.hostMemoryAllocator
- a host memory allocator for an intermediate host memory buffer- Returns:
- the deserialized table in device memory, or null if the stream has no table to read from, an end of the stream at the very beginning.
- Throws:
IOException
- on any error.EOFException
- if the data stream ended unexpectedly in the middle of processing.
-
readTableFrom
public static JCudfSerialization.TableAndRowCountPair readTableFrom(InputStream in) throws IOException - Throws:
IOException
-