public class JCudfSerialization extends Object
The goal is to transfer data from a local GPU to a remote GPU as quickly and efficiently as possible using build in java communication channels. There is no guarantee of compatibility between different releases of CUDF. This is to allow us to adapt if internal memory layouts and formats change.
This version optimizes for reduced memory transfers, and as such will try to do the fewest number of transfers possible when putting the data back onto the GPU. This means that it will slice a single large memory buffer into smaller buffers used by the resulting ColumnVectors. The downside of this is that generally none of the memory can be released until all of the ColumnVectors are closed. It is assumed that this will not be a problem because for processing efficiency after the data is transferred it will likely be combined with other similar batches from other processes into a single larger buffer.
Modifier and Type | Class and Description |
---|---|
static class |
JCudfSerialization.HostConcatResult
Class to hold the header and buffer pair result from host-side concatenation
|
static class |
JCudfSerialization.SerializedColumnHeader
Holds the metadata about a serialized column.
|
static class |
JCudfSerialization.SerializedTableHeader
Holds the metadata about a serialized table.
|
static class |
JCudfSerialization.TableAndRowCountPair
Holds the result of deserializing a table.
|
Constructor and Description |
---|
JCudfSerialization() |
Modifier and Type | Method and Description |
---|---|
static ContiguousTable |
concatToContiguousTable(JCudfSerialization.SerializedTableHeader[] headers,
HostMemoryBuffer[] dataBuffers)
Concatenate multiple tables in host memory into a contiguous table in device memory.
|
static JCudfSerialization.HostConcatResult |
concatToHostBuffer(JCudfSerialization.SerializedTableHeader[] headers,
HostMemoryBuffer[] dataBuffers) |
static JCudfSerialization.HostConcatResult |
concatToHostBuffer(JCudfSerialization.SerializedTableHeader[] headers,
HostMemoryBuffer[] dataBuffers,
HostMemoryAllocator hostMemoryAllocator)
Concatenate multiple tables in host memory into a single host table buffer.
|
static long |
getSerializedSizeInBytes(HostColumnVector[] columns,
long rowOffset,
long numRows)
Get the size in bytes needed to serialize the given data.
|
static Table |
readAndConcat(JCudfSerialization.SerializedTableHeader[] headers,
HostMemoryBuffer[] dataBuffers) |
static JCudfSerialization.TableAndRowCountPair |
readTableFrom(InputStream in) |
static JCudfSerialization.TableAndRowCountPair |
readTableFrom(InputStream in,
HostMemoryAllocator hostMemoryAllocator)
Read a serialize table from the given InputStream.
|
static JCudfSerialization.TableAndRowCountPair |
readTableFrom(JCudfSerialization.SerializedTableHeader header,
HostMemoryBuffer hostBuffer) |
static void |
readTableIntoBuffer(InputStream in,
JCudfSerialization.SerializedTableHeader header,
HostMemoryBuffer buffer)
After reading a header for a table read the data portion into a host side buffer.
|
static HostColumnVector[] |
unpackHostColumnVectors(JCudfSerialization.SerializedTableHeader header,
HostMemoryBuffer hostBuffer)
Deserialize a serialized contiguous table into an array of host columns.
|
static void |
writeConcatedStream(JCudfSerialization.SerializedTableHeader[] headers,
HostMemoryBuffer[] dataBuffers,
OutputStream out)
Take the data from multiple batches stored in the parsed headers and the dataBuffer and write
it out to out as if it were a single buffer.
|
static void |
writeRowsToStream(OutputStream out,
long numRows)
Write a rowcount only header to the output stream in a case
where a columnar batch with no columns but a non zero row count is received
|
static void |
writeToStream(ColumnVector[] columns,
OutputStream out,
long rowOffset,
long numRows)
Write all or part of a set of columns out in an internal format.
|
static void |
writeToStream(HostColumnVector[] columns,
OutputStream out,
long rowOffset,
long numRows)
Write all or part of a set of columns out in an internal format.
|
static void |
writeToStream(Table t,
OutputStream out,
long rowOffset,
long numRows)
Write all or part of a table out in an internal format.
|
public static long getSerializedSizeInBytes(HostColumnVector[] columns, long rowOffset, long numRows)
columns
- columns to be serialized.rowOffset
- the first row to serialize.numRows
- the number of rows to serialize.public static void writeToStream(Table t, OutputStream out, long rowOffset, long numRows) throws IOException
t
- the table to be written.out
- the stream to write the serialized table out to.rowOffset
- the first row to write out.numRows
- the number of rows to write out.IOException
public static void writeToStream(ColumnVector[] columns, OutputStream out, long rowOffset, long numRows) throws IOException
columns
- the columns to be written.out
- the stream to write the serialized table out to.rowOffset
- the first row to write out.numRows
- the number of rows to write out.IOException
public static void writeToStream(HostColumnVector[] columns, OutputStream out, long rowOffset, long numRows) throws IOException
columns
- the columns to be written.out
- the stream to write the serialized table out to.rowOffset
- the first row to write out.numRows
- the number of rows to write out.IOException
public static void writeRowsToStream(OutputStream out, long numRows) throws IOException
out
- the stream to write the serialized table out to.numRows
- the number of rows to write out.IOException
public static void writeConcatedStream(JCudfSerialization.SerializedTableHeader[] headers, HostMemoryBuffer[] dataBuffers, OutputStream out) throws IOException
headers
- the headers parsed from multiple streams.dataBuffers
- an array of buffers that hold the data, one per header.out
- what to write the data out to.IOException
- on any error.public static Table readAndConcat(JCudfSerialization.SerializedTableHeader[] headers, HostMemoryBuffer[] dataBuffers) throws IOException
IOException
public static ContiguousTable concatToContiguousTable(JCudfSerialization.SerializedTableHeader[] headers, HostMemoryBuffer[] dataBuffers) throws IOException
headers
- table headers corresponding to the host table buffersdataBuffers
- host table buffer for each input table to be concatenatedIOException
public static JCudfSerialization.HostConcatResult concatToHostBuffer(JCudfSerialization.SerializedTableHeader[] headers, HostMemoryBuffer[] dataBuffers, HostMemoryAllocator hostMemoryAllocator) throws IOException
headers
- table headers corresponding to the host table buffersdataBuffers
- host table buffer for each input table to be concatenatedhostMemoryAllocator
- allocator for host memory buffersIOException
public static JCudfSerialization.HostConcatResult concatToHostBuffer(JCudfSerialization.SerializedTableHeader[] headers, HostMemoryBuffer[] dataBuffers) throws IOException
IOException
public static HostColumnVector[] unpackHostColumnVectors(JCudfSerialization.SerializedTableHeader header, HostMemoryBuffer hostBuffer)
header
- serialized table headerhostBuffer
- buffer containing the data for all columns in the serialized tablepublic static void readTableIntoBuffer(InputStream in, JCudfSerialization.SerializedTableHeader header, HostMemoryBuffer buffer) throws IOException
in
- the stream to read the data from.header
- the header that finished just moments ago.buffer
- the buffer to write the data into. If there is not enough room to store
the data in buffer it will not be read and header will still have dataRead
set to false.IOException
public static JCudfSerialization.TableAndRowCountPair readTableFrom(JCudfSerialization.SerializedTableHeader header, HostMemoryBuffer hostBuffer)
public static JCudfSerialization.TableAndRowCountPair readTableFrom(InputStream in, HostMemoryAllocator hostMemoryAllocator) throws IOException
in
- the stream to read the table data from.hostMemoryAllocator
- a host memory allocator for an intermediate host memory bufferIOException
- on any error.EOFException
- if the data stream ended unexpectedly in the middle of processing.public static JCudfSerialization.TableAndRowCountPair readTableFrom(InputStream in) throws IOException
IOException
Copyright © 2024. All rights reserved.