public final class Table extends Object implements AutoCloseable
Modifier and Type | Class and Description |
---|---|
static class |
Table.DuplicateKeepOption
Enum to specify which of duplicate rows/elements will be copied to the output.
|
static class |
Table.GroupByOperation
Class representing groupby operations
|
static class |
Table.TableOperation |
static class |
Table.TestBuilder
Create a table on the GPU with data from the CPU.
|
Constructor and Description |
---|
Table(ColumnVector... columns)
Table class makes a copy of the array of
ColumnVector s passed to it. |
Table(long[] cudfColumns)
Create a Table from an array of existing on device cudf::column pointers.
|
Modifier and Type | Method and Description |
---|---|
void |
close() |
static Table |
concatenate(Table... tables)
Concatenate multiple tables together to form a single table.
|
GatherMap[] |
conditionalFullJoinGatherMaps(Table rightTable,
CompiledExpression condition)
Computes the gather maps that can be used to manifest the result of a full join between
two tables when a conditional expression is true.
|
GatherMap[] |
conditionalInnerJoinGatherMaps(Table rightTable,
CompiledExpression condition)
Computes the gather maps that can be used to manifest the result of an inner join between
two tables when a conditional expression is true.
|
GatherMap[] |
conditionalInnerJoinGatherMaps(Table rightTable,
CompiledExpression condition,
long outputRowCount)
Computes the gather maps that can be used to manifest the result of an inner join between
two tables when a conditional expression is true.
|
long |
conditionalInnerJoinRowCount(Table rightTable,
CompiledExpression condition)
Computes the number of rows from the result of an inner join between two tables when a
conditional expression is true.
|
GatherMap |
conditionalLeftAntiJoinGatherMap(Table rightTable,
CompiledExpression condition)
Computes the gather map that can be used to manifest the result of a left anti join between
two tables when a conditional expression is true.
|
GatherMap |
conditionalLeftAntiJoinGatherMap(Table rightTable,
CompiledExpression condition,
long outputRowCount)
Computes the gather map that can be used to manifest the result of a left anti join between
two tables when a conditional expression is true.
|
long |
conditionalLeftAntiJoinRowCount(Table rightTable,
CompiledExpression condition)
Computes the number of rows from the result of a left anti join between two tables when a
conditional expression is true.
|
GatherMap[] |
conditionalLeftJoinGatherMaps(Table rightTable,
CompiledExpression condition)
Computes the gather maps that can be used to manifest the result of a left join between
two tables when a conditional expression is true.
|
GatherMap[] |
conditionalLeftJoinGatherMaps(Table rightTable,
CompiledExpression condition,
long outputRowCount)
Computes the gather maps that can be used to manifest the result of a left join between
two tables when a conditional expression is true.
|
long |
conditionalLeftJoinRowCount(Table rightTable,
CompiledExpression condition)
Computes the number of rows from the result of a left join between two tables when a
conditional expression is true.
|
GatherMap |
conditionalLeftSemiJoinGatherMap(Table rightTable,
CompiledExpression condition)
Computes the gather map that can be used to manifest the result of a left semi join between
two tables when a conditional expression is true.
|
GatherMap |
conditionalLeftSemiJoinGatherMap(Table rightTable,
CompiledExpression condition,
long outputRowCount)
Computes the gather map that can be used to manifest the result of a left semi join between
two tables when a conditional expression is true.
|
long |
conditionalLeftSemiJoinRowCount(Table rightTable,
CompiledExpression condition)
Computes the number of rows from the result of a left semi join between two tables when a
conditional expression is true.
|
ContiguousTable[] |
contiguousSplit(int... indices)
Split a table at given boundaries, but the result of each split has memory that is laid out
in a contiguous range of memory.
|
Table |
crossJoin(Table right)
Joins two tables all of the left against all of the right.
|
int |
distinctCount()
Count how many rows in the table are distinct from one another.
|
int |
distinctCount(NullEquality nullsEqual)
Count how many rows in the table are distinct from one another.
|
Table |
dropDuplicates(int[] keyColumns,
Table.DuplicateKeepOption keep,
boolean nullsEqual)
Copy rows of the current table to an output table such that duplicate rows in the key columns
are ignored (i.e., only one row from the duplicate ones will be copied).
|
Table |
explode(int index)
Explodes a list column's elements.
|
Table |
explodeOuter(int index)
Explodes a list column's elements.
|
Table |
explodeOuterPosition(int index)
Explodes a list column's elements retaining any null entries or empty lists and includes a
position column.
|
Table |
explodePosition(int index)
Explodes a list column's elements and includes a position column.
|
Table |
filter(ColumnView mask)
Filters this table using a column of boolean values as a mask, returning a new one.
|
static Table |
fromPackedTable(ByteBuffer metadata,
DeviceMemoryBuffer data)
Construct a table from a packed representation.
|
GatherMap[] |
fullJoinGatherMaps(HashJoin rightHash)
Computes the gather maps that can be used to manifest the result of a full equi-join between
two tables.
|
GatherMap[] |
fullJoinGatherMaps(HashJoin rightHash,
long outputRowCount)
Computes the gather maps that can be used to manifest the result of a full equi-join between
two tables.
|
GatherMap[] |
fullJoinGatherMaps(Table rightKeys,
boolean compareNullsEqual)
Computes the gather maps that can be used to manifest the result of an full equi-join between
two tables.
|
long |
fullJoinRowCount(HashJoin rightHash)
Computes the number of rows resulting from a full equi-join between two tables.
|
Table |
gather(ColumnView gatherMap)
Gathers the rows of this table according to `gatherMap` such that row "i"
in the resulting table's columns will contain row "gatherMap[i]" from this table.
|
Table |
gather(ColumnView gatherMap,
OutOfBoundsPolicy outOfBoundsPolicy)
Gathers the rows of this table according to `gatherMap` such that row "i"
in the resulting table's columns will contain row "gatherMap[i]" from this table.
|
ColumnVector |
getColumn(int index)
Return the
ColumnVector at the specified index. |
static TableWriter |
getCSVBufferWriter(CSVWriterOptions options,
HostBufferConsumer bufferConsumer) |
static TableWriter |
getCSVBufferWriter(CSVWriterOptions options,
HostBufferConsumer bufferConsumer,
HostMemoryAllocator hostMemoryAllocator) |
long |
getDeviceMemorySize()
Returns the Device memory buffer size.
|
long |
getNativeView()
Return the native table view handle for this table
|
int |
getNumberOfColumns() |
long |
getRowCount() |
Table.GroupByOperation |
groupBy(GroupByOptions groupByOptions,
int... indices)
Returns aggregate operations grouped by columns provided in indices
|
Table.GroupByOperation |
groupBy(int... indices)
Returns aggregate operations grouped by columns provided in indices
with default options as below:
- null is considered as key while grouping.
|
GatherMap[] |
innerDistinctJoinGatherMaps(Table rightKeys,
boolean compareNullsEqual)
Computes the gather maps that can be used to manifest the result of an inner equi-join between
two tables where the right table is guaranteed to not contain any duplicated join keys.
|
GatherMap[] |
innerJoinGatherMaps(HashJoin rightHash)
Computes the gather maps that can be used to manifest the result of an inner equi-join between
two tables.
|
GatherMap[] |
innerJoinGatherMaps(HashJoin rightHash,
long outputRowCount)
Computes the gather maps that can be used to manifest the result of an inner equi-join between
two tables.
|
GatherMap[] |
innerJoinGatherMaps(Table rightKeys,
boolean compareNullsEqual)
Computes the gather maps that can be used to manifest the result of an inner equi-join between
two tables.
|
long |
innerJoinRowCount(HashJoin otherHash)
Computes the number of rows resulting from an inner equi-join between two tables.
|
ColumnVector |
interleaveColumns()
Interleave all columns into a single column.
|
GatherMap |
leftAntiJoinGatherMap(Table rightKeys,
boolean compareNullsEqual)
Computes the gather map that can be used to manifest the result of a left anti-join between
two tables.
|
GatherMap |
leftDistinctJoinGatherMap(Table rightKeys,
boolean compareNullsEqual)
Computes a gather map that can be used to manifest the result of a left equi-join between
two tables where the right table is guaranteed to not contain any duplicated join keys.
|
GatherMap[] |
leftJoinGatherMaps(HashJoin rightHash)
Computes the gather maps that can be used to manifest the result of a left equi-join between
two tables.
|
GatherMap[] |
leftJoinGatherMaps(HashJoin rightHash,
long outputRowCount)
Computes the gather maps that can be used to manifest the result of a left equi-join between
two tables.
|
GatherMap[] |
leftJoinGatherMaps(Table rightKeys,
boolean compareNullsEqual)
Computes the gather maps that can be used to manifest the result of a left equi-join between
two tables.
|
long |
leftJoinRowCount(HashJoin rightHash)
Computes the number of rows resulting from a left equi-join between two tables.
|
GatherMap |
leftSemiJoinGatherMap(Table rightKeys,
boolean compareNullsEqual)
Computes the gather map that can be used to manifest the result of a left semi-join between
two tables.
|
ColumnVector |
lowerBound(boolean[] areNullsSmallest,
Table valueTable,
boolean[] descFlags)
Find smallest indices in a sorted table where values should be inserted to maintain order.
|
ColumnVector |
lowerBound(Table valueTable,
OrderByArg... args)
Find smallest indices in a sorted table where values should be inserted to maintain order.
|
ChunkedPack |
makeChunkedPack(long bounceBufferSize)
Create an instance of `ChunkedPack` which can be used to pack this table
contiguously in memory utilizing a bounce buffer of size `bounceBufferSize`.
|
ChunkedPack |
makeChunkedPack(long bounceBufferSize,
RmmDeviceMemoryResource tempMemoryResource)
Create an instance of `ChunkedPack` which can be used to pack this table
contiguously in memory utilizing a bounce buffer of size `bounceBufferSize`.
|
static Table |
merge(List<Table> tables,
OrderByArg... args)
Merge multiple already sorted tables keeping the sort order the same.
|
static Table |
merge(Table[] tables,
OrderByArg... args)
Merge multiple already sorted tables keeping the sort order the same.
|
static GatherMap[] |
mixedFullJoinGatherMaps(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality)
Computes the gather maps that can be used to manifest the result of a full join between
two tables using a mix of equality and inequality conditions.
|
static GatherMap[] |
mixedInnerJoinGatherMaps(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality)
Computes the gather maps that can be used to manifest the result of an inner join between
two tables using a mix of equality and inequality conditions.
|
static GatherMap[] |
mixedInnerJoinGatherMaps(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality,
MixedJoinSize joinSize)
Computes the gather maps that can be used to manifest the result of an inner join between
two tables using a mix of equality and inequality conditions.
|
static MixedJoinSize |
mixedInnerJoinSize(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality)
Computes output size information for an inner join between two tables using a mix of equality
and inequality conditions.
|
static GatherMap |
mixedLeftAntiJoinGatherMap(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality)
Computes the gather map that can be used to manifest the result of a left anti join between
two tables using a mix of equality and inequality conditions.
|
static GatherMap[] |
mixedLeftJoinGatherMaps(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality)
Computes the gather maps that can be used to manifest the result of a left join between
two tables using a mix of equality and inequality conditions.
|
static GatherMap[] |
mixedLeftJoinGatherMaps(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality,
MixedJoinSize joinSize)
Computes the gather maps that can be used to manifest the result of a left join between
two tables using a mix of equality and inequality conditions.
|
static MixedJoinSize |
mixedLeftJoinSize(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality)
Computes output size information for a left join between two tables using a mix of equality
and inequality conditions.
|
static GatherMap |
mixedLeftSemiJoinGatherMap(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality)
Computes the gather map that can be used to manifest the result of a left semi join between
two tables using a mix of equality and inequality conditions.
|
Table.TableOperation |
onColumns(int... indices) |
Table |
orderBy(OrderByArg... args)
Orders the table using the sortkeys returning a new allocated table.
|
PartitionedTable |
partition(ColumnView partitionMap,
int numberOfPartitions)
Partition this table using the mapping in partitionMap.
|
static TableWithMeta |
readAndInferJSON(JSONOptions opts,
DataSource ds)
Read JSON formatted data and infer the column names and schema.
|
static StreamedTableReader |
readArrowIPCChunked(ArrowIPCOptions options,
File inputFile)
Get a reader that will return tables.
|
static StreamedTableReader |
readArrowIPCChunked(ArrowIPCOptions options,
HostBufferProvider provider) |
static StreamedTableReader |
readArrowIPCChunked(ArrowIPCOptions options,
HostBufferProvider provider,
HostMemoryAllocator hostMemoryAllocator)
Get a reader that will return tables.
|
static StreamedTableReader |
readArrowIPCChunked(File inputFile)
Get a reader that will return tables.
|
static StreamedTableReader |
readArrowIPCChunked(HostBufferProvider provider)
Get a reader that will return tables.
|
static Table |
readAvro(AvroOptions opts,
byte[] buffer)
Read Avro formatted data.
|
static Table |
readAvro(AvroOptions opts,
byte[] buffer,
long offset,
long len) |
static Table |
readAvro(AvroOptions opts,
byte[] buffer,
long offset,
long len,
HostMemoryAllocator hostMemoryAllocator)
Read Avro formatted data.
|
static Table |
readAvro(AvroOptions opts,
DataSource ds) |
static Table |
readAvro(AvroOptions opts,
File path)
Read an Avro file.
|
static Table |
readAvro(AvroOptions opts,
HostMemoryBuffer buffer,
long offset,
long len)
Read Avro formatted data.
|
static Table |
readAvro(byte[] buffer)
Read Avro formatted data.
|
static Table |
readAvro(File path)
Read an Avro file using the default AvroOptions.
|
static Table |
readCSV(Schema schema,
byte[] buffer)
Read CSV formatted data using the default CSVOptions.
|
static Table |
readCSV(Schema schema,
CSVOptions opts,
byte[] buffer)
Read CSV formatted data.
|
static Table |
readCSV(Schema schema,
CSVOptions opts,
byte[] buffer,
long offset,
long len) |
static Table |
readCSV(Schema schema,
CSVOptions opts,
byte[] buffer,
long offset,
long len,
HostMemoryAllocator hostMemoryAllocator)
Read CSV formatted data.
|
static Table |
readCSV(Schema schema,
CSVOptions opts,
DataSource ds) |
static Table |
readCSV(Schema schema,
CSVOptions opts,
File path)
Read a CSV file.
|
static Table |
readCSV(Schema schema,
CSVOptions opts,
HostMemoryBuffer buffer,
long offset,
long len)
Read CSV formatted data.
|
static Table |
readCSV(Schema schema,
File path)
Read a CSV file using the default CSVOptions.
|
static TableWithMeta |
readJSON(JSONOptions opts,
HostMemoryBuffer buffer,
long offset,
long len)
Read JSON formatted data and infer the column names and schema.
|
static Table |
readJSON(Schema schema,
byte[] buffer)
Read JSON formatted data using the default JSONOptions.
|
static Table |
readJSON(Schema schema,
File path)
Read a JSON file using the default JSONOptions.
|
static Table |
readJSON(Schema schema,
JSONOptions opts,
byte[] buffer)
Read JSON formatted data.
|
static Table |
readJSON(Schema schema,
JSONOptions opts,
byte[] buffer,
long offset,
long len) |
static Table |
readJSON(Schema schema,
JSONOptions opts,
byte[] buffer,
long offset,
long len,
HostMemoryAllocator hostMemoryAllocator)
Read JSON formatted data.
|
static Table |
readJSON(Schema schema,
JSONOptions opts,
byte[] buffer,
long offset,
long len,
HostMemoryAllocator hostMemoryAllocator,
int emptyRowCount)
Read JSON formatted data.
|
static Table |
readJSON(Schema schema,
JSONOptions opts,
byte[] buffer,
long offset,
long len,
int emptyRowCount) |
static Table |
readJSON(Schema schema,
JSONOptions opts,
DataSource ds)
Read JSON formatted data.
|
static Table |
readJSON(Schema schema,
JSONOptions opts,
DataSource ds,
int emptyRowCount)
Read JSON formatted data.
|
static Table |
readJSON(Schema schema,
JSONOptions opts,
File path)
Read a JSON file.
|
static Table |
readJSON(Schema schema,
JSONOptions opts,
HostMemoryBuffer buffer,
long offset,
long len)
Read JSON formatted data.
|
static Table |
readJSON(Schema schema,
JSONOptions opts,
HostMemoryBuffer buffer,
long offset,
long len,
int emptyRowCount)
Read JSON formatted data.
|
static Table |
readORC(byte[] buffer)
Read ORC formatted data.
|
static Table |
readORC(File path)
Read a ORC file using the default ORCOptions.
|
static Table |
readORC(ORCOptions opts,
byte[] buffer)
Read ORC formatted data.
|
static Table |
readORC(ORCOptions opts,
byte[] buffer,
long offset,
long len) |
static Table |
readORC(ORCOptions opts,
byte[] buffer,
long offset,
long len,
HostMemoryAllocator hostMemoryAllocator)
Read ORC formatted data.
|
static Table |
readORC(ORCOptions opts,
DataSource ds) |
static Table |
readORC(ORCOptions opts,
File path)
Read a ORC file.
|
static Table |
readORC(ORCOptions opts,
HostMemoryBuffer buffer,
long offset,
long len)
Read ORC formatted data.
|
static Table |
readParquet(byte[] buffer)
Read parquet formatted data.
|
static Table |
readParquet(File path)
Read a Parquet file using the default ParquetOptions.
|
static Table |
readParquet(ParquetOptions opts,
byte[] buffer)
Read parquet formatted data.
|
static Table |
readParquet(ParquetOptions opts,
byte[] buffer,
long offset,
long len) |
static Table |
readParquet(ParquetOptions opts,
byte[] buffer,
long offset,
long len,
HostMemoryAllocator hostMemoryAllocator)
Read parquet formatted data.
|
static Table |
readParquet(ParquetOptions opts,
DataSource ds) |
static Table |
readParquet(ParquetOptions opts,
File path)
Read a Parquet file.
|
static Table |
readParquet(ParquetOptions opts,
HostMemoryBuffer buffer,
long offset,
long len)
Read parquet formatted data.
|
Table |
repeat(ColumnView counts)
Create a new table by repeating each row of this table.
|
Table |
repeat(int count)
Repeat each row of this table count times.
|
PartitionedTable |
roundRobinPartition(int numberOfPartitions,
int startPartition)
Round-robin partition a table into the specified number of partitions.
|
ColumnVector |
rowBitCount()
Returns an approximate cumulative size in bits of all columns in the `table_view` for each row.
|
Table |
sample(long n,
boolean replacement,
long seed)
Gather `n` samples from table randomly
Note: does not preserve the ordering
Example:
input: {col1: {1, 2, 3, 4, 5}, col2: {6, 7, 8, 9, 10}}
n: 3
replacement: false
output: {col1: {3, 1, 4}, col2: {8, 6, 9}}
replacement: true
output: {col1: {3, 1, 1}, col2: {8, 6, 6}}
throws "logic_error" if `n` > table rows and `replacement` == FALSE.
|
Table |
scatter(ColumnView scatterMap,
Table target)
Scatters values from the source table into the target table out-of-place, returning a new
result table.
|
static Table |
scatter(Scalar[] source,
ColumnView scatterMap,
Table target)
Scatters values from the source rows into the target table out-of-place, returning a new result
table.
|
ColumnVector |
sortOrder(OrderByArg... args)
Get back a gather map that can be used to sort the data.
|
String |
toString() |
ColumnVector |
upperBound(boolean[] areNullsSmallest,
Table valueTable,
boolean[] descFlags)
Find largest indices in a sorted table where values should be inserted to maintain order.
|
ColumnVector |
upperBound(Table valueTable,
OrderByArg... args)
Find largest indices in a sorted table where values should be inserted to maintain order.
|
static TableWriter |
writeArrowIPCChunked(ArrowIPCWriterOptions options,
File outputFile)
Get a table writer to write arrow IPC data to a file.
|
static TableWriter |
writeArrowIPCChunked(ArrowIPCWriterOptions options,
HostBufferConsumer consumer) |
static TableWriter |
writeArrowIPCChunked(ArrowIPCWriterOptions options,
HostBufferConsumer consumer,
HostMemoryAllocator hostMemoryAllocator)
Get a table writer to write arrow IPC data and handle each chunk with a callback.
|
static void |
writeColumnViewsToParquet(ParquetWriterOptions options,
HostBufferConsumer consumer,
ColumnView... columnViews) |
static void |
writeColumnViewsToParquet(ParquetWriterOptions options,
HostBufferConsumer consumer,
HostMemoryAllocator hostMemoryAllocator,
ColumnView... columnViews)
This is an evolving API and most likely be removed in future releases.
|
void |
writeCSVToFile(CSVWriterOptions options,
String outputPath) |
static TableWriter |
writeORCChunked(ORCWriterOptions options,
File outputFile)
Get a table writer to write ORC data to a file.
|
static TableWriter |
writeORCChunked(ORCWriterOptions options,
HostBufferConsumer consumer) |
static TableWriter |
writeORCChunked(ORCWriterOptions options,
HostBufferConsumer consumer,
HostMemoryAllocator hostMemoryAllocator)
Get a table writer to write ORC data and handle each chunk with a callback.
|
static TableWriter |
writeParquetChunked(ParquetWriterOptions options,
File outputFile)
Get a table writer to write parquet data to a file.
|
static TableWriter |
writeParquetChunked(ParquetWriterOptions options,
HostBufferConsumer consumer) |
static TableWriter |
writeParquetChunked(ParquetWriterOptions options,
HostBufferConsumer consumer,
HostMemoryAllocator hostMemoryAllocator)
Get a table writer to write parquet data and handle each chunk with a callback.
|
public Table(ColumnVector... columns)
ColumnVector
s passed to it. The class
will decrease the refcount
on itself and all its contents when closed and free resources if refcount is zerocolumns
- - Array of ColumnVectorspublic Table(long[] cudfColumns)
cudfColumns
- - Array of nativeHandlespublic long getNativeView()
public ColumnVector getColumn(int index)
ColumnVector
at the specified index. If you want to keep a reference to
the column around past the life time of the table, you will need to increment the reference
count on the column yourself.public final long getRowCount()
public final int getNumberOfColumns()
public void close()
close
in interface AutoCloseable
public long getDeviceMemorySize()
public static Table readCSV(Schema schema, File path)
schema
- the schema of the file. You may use Schema.INFERRED to infer the schema.path
- the local file to read.public static Table readCSV(Schema schema, CSVOptions opts, File path)
schema
- the schema of the file. You may use Schema.INFERRED to infer the schema.opts
- various CSV parsing options.path
- the local file to read.public static Table readCSV(Schema schema, byte[] buffer)
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.buffer
- raw UTF8 formatted bytes.public static Table readCSV(Schema schema, CSVOptions opts, byte[] buffer)
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.opts
- various CSV parsing options.buffer
- raw UTF8 formatted bytes.public static Table readCSV(Schema schema, CSVOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator)
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.opts
- various CSV parsing options.buffer
- raw UTF8 formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.hostMemoryAllocator
- allocator for host memory bufferspublic static Table readCSV(Schema schema, CSVOptions opts, byte[] buffer, long offset, long len)
public static Table readCSV(Schema schema, CSVOptions opts, HostMemoryBuffer buffer, long offset, long len)
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.opts
- various CSV parsing options.buffer
- raw UTF8 formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.public static Table readCSV(Schema schema, CSVOptions opts, DataSource ds)
public void writeCSVToFile(CSVWriterOptions options, String outputPath)
public static TableWriter getCSVBufferWriter(CSVWriterOptions options, HostBufferConsumer bufferConsumer, HostMemoryAllocator hostMemoryAllocator)
public static TableWriter getCSVBufferWriter(CSVWriterOptions options, HostBufferConsumer bufferConsumer)
public static Table readJSON(Schema schema, File path)
schema
- the schema of the file. You may use Schema.INFERRED to infer the schema.path
- the local file to read.public static Table readJSON(Schema schema, byte[] buffer)
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.buffer
- raw UTF8 formatted bytes.public static Table readJSON(Schema schema, JSONOptions opts, byte[] buffer)
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.opts
- various JSON parsing options.buffer
- raw UTF8 formatted bytes.public static Table readJSON(Schema schema, JSONOptions opts, File path)
schema
- the schema of the file. You may use Schema.INFERRED to infer the schema.opts
- various JSON parsing options.path
- the local file to read.public static Table readJSON(Schema schema, JSONOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator)
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.opts
- various JSON parsing options.buffer
- raw UTF8 formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.hostMemoryAllocator
- allocator for host memory bufferspublic static Table readJSON(Schema schema, JSONOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator, int emptyRowCount)
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.opts
- various JSON parsing options.buffer
- raw UTF8 formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.hostMemoryAllocator
- allocator for host memory buffersemptyRowCount
- the number of rows to return if no columns were read.public static Table readJSON(Schema schema, JSONOptions opts, byte[] buffer, long offset, long len, int emptyRowCount)
public static Table readJSON(Schema schema, JSONOptions opts, byte[] buffer, long offset, long len)
public static TableWithMeta readJSON(JSONOptions opts, HostMemoryBuffer buffer, long offset, long len)
opts
- various JSON parsing options.buffer
- raw UTF8 formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.public static TableWithMeta readAndInferJSON(JSONOptions opts, DataSource ds)
opts
- various JSON parsing options.public static Table readJSON(Schema schema, JSONOptions opts, HostMemoryBuffer buffer, long offset, long len)
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.opts
- various JSON parsing options.buffer
- raw UTF8 formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.public static Table readJSON(Schema schema, JSONOptions opts, HostMemoryBuffer buffer, long offset, long len, int emptyRowCount)
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.opts
- various JSON parsing options.buffer
- raw UTF8 formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.emptyRowCount
- the number of rows to use if no columns were found.public static Table readJSON(Schema schema, JSONOptions opts, DataSource ds)
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.opts
- various JSON parsing options.ds
- the DataSource to read from.public static Table readJSON(Schema schema, JSONOptions opts, DataSource ds, int emptyRowCount)
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.opts
- various JSON parsing options.ds
- the DataSource to read from.emptyRowCount
- the number of rows to return if no columns were read.public static Table readParquet(File path)
path
- the local file to read.public static Table readParquet(ParquetOptions opts, File path)
opts
- various parquet parsing options.path
- the local file to read.public static Table readParquet(byte[] buffer)
buffer
- raw parquet formatted bytes.public static Table readParquet(ParquetOptions opts, byte[] buffer)
opts
- various parquet parsing options.buffer
- raw parquet formatted bytes.public static Table readParquet(ParquetOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator)
opts
- various parquet parsing options.buffer
- raw parquet formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.hostMemoryAllocator
- allocator for host memory bufferspublic static Table readParquet(ParquetOptions opts, byte[] buffer, long offset, long len)
public static Table readParquet(ParquetOptions opts, HostMemoryBuffer buffer, long offset, long len)
opts
- various parquet parsing options.buffer
- raw parquet formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.public static Table readParquet(ParquetOptions opts, DataSource ds)
public static Table readAvro(File path)
path
- the local file to read.public static Table readAvro(AvroOptions opts, File path)
opts
- various Avro parsing options.path
- the local file to read.public static Table readAvro(byte[] buffer)
buffer
- raw Avro formatted bytes.public static Table readAvro(AvroOptions opts, byte[] buffer)
opts
- various Avro parsing options.buffer
- raw Avro formatted bytes.public static Table readAvro(AvroOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator)
opts
- various Avro parsing options.buffer
- raw Avro formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.hostMemoryAllocator
- allocator for host memory bufferspublic static Table readAvro(AvroOptions opts, byte[] buffer, long offset, long len)
public static Table readAvro(AvroOptions opts, HostMemoryBuffer buffer, long offset, long len)
opts
- various Avro parsing options.buffer
- raw Avro formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.public static Table readAvro(AvroOptions opts, DataSource ds)
public static Table readORC(File path)
path
- the local file to read.public static Table readORC(ORCOptions opts, File path)
opts
- ORC parsing options.path
- the local file to read.public static Table readORC(byte[] buffer)
buffer
- raw ORC formatted bytes.public static Table readORC(ORCOptions opts, byte[] buffer)
opts
- various ORC parsing options.buffer
- raw ORC formatted bytes.public static Table readORC(ORCOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator)
opts
- various ORC parsing options.buffer
- raw ORC formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.hostMemoryAllocator
- allocator for host memory bufferspublic static Table readORC(ORCOptions opts, byte[] buffer, long offset, long len)
public static Table readORC(ORCOptions opts, HostMemoryBuffer buffer, long offset, long len)
opts
- various ORC parsing options.buffer
- raw ORC formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.public static Table readORC(ORCOptions opts, DataSource ds)
public static TableWriter writeParquetChunked(ParquetWriterOptions options, File outputFile)
options
- the parquet writer options.outputFile
- where to write the file.public static TableWriter writeParquetChunked(ParquetWriterOptions options, HostBufferConsumer consumer, HostMemoryAllocator hostMemoryAllocator)
options
- the parquet writer options.consumer
- a class that will be called when host buffers are ready with parquet
formatted data in them.hostMemoryAllocator
- allocator for host memory bufferspublic static TableWriter writeParquetChunked(ParquetWriterOptions options, HostBufferConsumer consumer)
public static void writeColumnViewsToParquet(ParquetWriterOptions options, HostBufferConsumer consumer, HostMemoryAllocator hostMemoryAllocator, ColumnView... columnViews)
options
- the Parquet writer options.consumer
- a class that will be called when host buffers are ready with Parquet
formatted data in them.hostMemoryAllocator
- allocator for host memory bufferscolumnViews
- ColumnViews to write to Parquetpublic static void writeColumnViewsToParquet(ParquetWriterOptions options, HostBufferConsumer consumer, ColumnView... columnViews)
public static TableWriter writeORCChunked(ORCWriterOptions options, File outputFile)
options
- the ORC writer options.outputFile
- where to write the file.public static TableWriter writeORCChunked(ORCWriterOptions options, HostBufferConsumer consumer, HostMemoryAllocator hostMemoryAllocator)
options
- the ORC writer options.consumer
- a class that will be called when host buffers are ready with ORC
formatted data in them.hostMemoryAllocator
- allocator for host memory bufferspublic static TableWriter writeORCChunked(ORCWriterOptions options, HostBufferConsumer consumer)
public static TableWriter writeArrowIPCChunked(ArrowIPCWriterOptions options, File outputFile)
options
- the arrow IPC writer options.outputFile
- where to write the file.public static TableWriter writeArrowIPCChunked(ArrowIPCWriterOptions options, HostBufferConsumer consumer, HostMemoryAllocator hostMemoryAllocator)
options
- the arrow IPC writer options.consumer
- a class that will be called when host buffers are ready with arrow IPC
formatted data in them.hostMemoryAllocator
- allocator for host memory bufferspublic static TableWriter writeArrowIPCChunked(ArrowIPCWriterOptions options, HostBufferConsumer consumer)
public static StreamedTableReader readArrowIPCChunked(ArrowIPCOptions options, File inputFile)
options
- options for reading.inputFile
- the file to read the Arrow IPC formatted data frompublic static StreamedTableReader readArrowIPCChunked(File inputFile)
inputFile
- the file to read the Arrow IPC formatted data frompublic static StreamedTableReader readArrowIPCChunked(ArrowIPCOptions options, HostBufferProvider provider, HostMemoryAllocator hostMemoryAllocator)
options
- options for reading.provider
- what will provide the data being read.public static StreamedTableReader readArrowIPCChunked(ArrowIPCOptions options, HostBufferProvider provider)
public static StreamedTableReader readArrowIPCChunked(HostBufferProvider provider)
provider
- what will provide the data being read.public static Table concatenate(Table... tables)
public ColumnVector interleaveColumns()
public Table repeat(int count)
count
- the number of times to repeat each row.public Table repeat(ColumnView counts)
counts
- the number of times to repeat each row. Cannot have nulls, must be an
Integer type, and must have one entry for each row in the table.CudfException
- on any error.public PartitionedTable partition(ColumnView partitionMap, int numberOfPartitions)
partitionMap
- the partitions for each row.numberOfPartitions
- number of partitionsPartitionedTable
Table that exposes a limited functionality of the
Table
classpublic ColumnVector lowerBound(boolean[] areNullsSmallest, Table valueTable, boolean[] descFlags)
Example: Single column: idx 0 1 2 3 4 inputTable = { 10, 20, 20, 30, 50 } valuesTable = { 20 } result = { 1 } Multi Column: idx 0 1 2 3 4 inputTable = {{ 10, 20, 20, 20, 20 }, { 5.0, .5, .5, .7, .7 }, { 90, 77, 78, 61, 61 }} valuesTable = {{ 20 }, { .7 }, { 61 }} result = { 3 }The input table and the values table need to be non-empty (row count > 0)
areNullsSmallest
- per column, true if nulls are assumed smallestvalueTable
- the table of values to find insertion locations fordescFlags
- per column indicates the ordering, true if descending.public ColumnVector lowerBound(Table valueTable, OrderByArg... args)
valueTable
- the table of values to find insertion locations forargs
- the sort order used to sort this table.public ColumnVector upperBound(boolean[] areNullsSmallest, Table valueTable, boolean[] descFlags)
Example: Single column: idx 0 1 2 3 4 inputTable = { 10, 20, 20, 30, 50 } valuesTable = { 20 } result = { 3 } Multi Column: idx 0 1 2 3 4 inputTable = {{ 10, 20, 20, 20, 20 }, { 5.0, .5, .5, .7, .7 }, { 90, 77, 78, 61, 61 }} valuesTable = {{ 20 }, { .7 }, { 61 }} result = { 5 }The input table and the values table need to be non-empty (row count > 0)
areNullsSmallest
- per column, true if nulls are assumed smallestvalueTable
- the table of values to find insertion locations fordescFlags
- per column indicates the ordering, true if descending.public ColumnVector upperBound(Table valueTable, OrderByArg... args)
valueTable
- the table of values to find insertion locations forargs
- the sort order used to sort this table.public Table crossJoin(Table right)
right
- the right tablepublic ColumnVector sortOrder(OrderByArg... args)
args
- what order to sort the data bypublic Table orderBy(OrderByArg... args)
ColumnVector
returned as part of the output Table
Example usage: orderBy(true, OrderByArg.asc(0), OrderByArg.desc(3)...);
args
- Suppliers to initialize sortKeys.public static Table merge(Table[] tables, OrderByArg... args)
tables
- the tables that should be merged.args
- the ordering of the tables. Should match how they were sorted
initially.public static Table merge(List<Table> tables, OrderByArg... args)
tables
- the tables that should be merged.args
- the ordering of the tables. Should match how they were sorted
initially.public Table.GroupByOperation groupBy(GroupByOptions groupByOptions, int... indices)
groupByOptions
- Options provided in the builderindices
- columns to be considered for groupBypublic Table.GroupByOperation groupBy(int... indices)
indices
- columns to be considered for groupBypublic PartitionedTable roundRobinPartition(int numberOfPartitions, int startPartition)
numberOfPartitions
- - number of partitions to usestartPartition
- - starting partition index (i.e.: where first row is placed).PartitionedTable
- Table that exposes a limited functionality of the
Table
classpublic Table.TableOperation onColumns(int... indices)
public Table filter(ColumnView mask)
Given a mask column, each element `i` from the input columns is copied to the output columns if the corresponding element `i` in the mask is non-null and `true`. This operation is stable: the input order is preserved.
This table and mask columns must have the same number of rows.
The output table has size equal to the number of elements in boolean_mask that are both non-null and `true`.
If the original table row count is zero, there is no error, and an empty table is returned.
mask
- column of type DType.BOOL8
used as a mask to filter
the input columnpublic Table dropDuplicates(int[] keyColumns, Table.DuplicateKeepOption keep, boolean nullsEqual)
keyColumns
- Array of indices representing key columns from the current table.keep
- Option specifying to keep any, first, last, or none of the found duplicates.nullsEqual
- Flag to denote whether nulls are treated as equal when comparing rows of the
key columns to check for uniqueness.public int distinctCount(NullEquality nullsEqual)
nullsEqual
- if nulls should be considered equal to each other or not.public int distinctCount()
public ContiguousTable[] contiguousSplit(int... indices)
Example:
input: [{10, 12, 14, 16, 18, 20, 22, 24, 26, 28},
{50, 52, 54, 56, 58, 60, 62, 64, 66, 68}]
splits: {2, 5, 9}
output: [{{10, 12}, {14, 16, 18}, {20, 22, 24, 26}, {28}},
{{50, 52}, {54, 56, 58}, {60, 62, 64, 66}, {68}}]
indices
- A vector of indices where to make the splitpublic ChunkedPack makeChunkedPack(long bounceBufferSize, RmmDeviceMemoryResource tempMemoryResource)
bounceBufferSize
- The size of bounce buffer that will be utilized to pack intotempMemoryResource
- A memory resource that is used to satisfy allocations for
temporary and thrust scratch space.public ChunkedPack makeChunkedPack(long bounceBufferSize)
bounceBufferSize
- The size of bounce buffer that will be utilized to pack intopublic Table explode(int index)
Example:
input: [[5,10,15], 100],
[[20,25], 200],
[[30], 300]
index: 0
output: [5, 100],
[10, 100],
[15, 100],
[20, 200],
[25, 200],
[30, 300]
Nulls propagate in different ways depending on what is null.
input: [[5,null,15], 100],
[null, 200]
index: 0
output: [5, 100],
[null, 100],
[15, 100]
Note that null lists are completely removed from the output
and nulls inside lists are pulled out and remain.index
- Column index to explode inside the table.public Table explodePosition(int index)
input: [[5,10,15], 100],
[[20,25], 200],
[[30], 300]
index: 0
output: [0, 5, 100],
[1, 10, 100],
[2, 15, 100],
[0, 20, 200],
[1, 25, 200],
[0, 30, 300]
Nulls and empty lists propagate in different ways depending on what is null or empty.
input: [[5,null,15], 100],
[null, 200]
index: 0
output: [5, 100],
[null, 100],
[15, 100]
Note that null lists are not included in the resulting table, but nulls inside
lists and empty lists will be represented with a null entry for that column in that row.index
- Column index to explode inside the table.public Table explodeOuter(int index)
Example:
input: [[5,10,15], 100],
[[20,25], 200],
[[30], 300],
index: 0
output: [5, 100],
[10, 100],
[15, 100],
[20, 200],
[25, 200],
[30, 300]
Nulls propagate in different ways depending on what is null.
input: [[5,null,15], 100],
[null, 200]
index: 0
output: [5, 100],
[null, 100],
[15, 100],
[null, 200]
Note that null lists are completely removed from the output
and nulls inside lists are pulled out and remain.index
- Column index to explode inside the table.public Table explodeOuterPosition(int index)
Example:
input: [[5,10,15], 100],
[[20,25], 200],
[[30], 300],
index: 0
output: [0, 5, 100],
[1, 10, 100],
[2, 15, 100],
[0, 20, 200],
[1, 25, 200],
[0, 30, 300]
Nulls and empty lists propagate as null entries in the result.
input: [[5,null,15], 100],
[null, 200],
[[], 300]
index: 0
output: [0, 5, 100],
[1, null, 100],
[2, 15, 100],
[0, null, 200],
[0, null, 300]
returnsindex
- Column index to explode inside the table.public ColumnVector rowBitCount()
public Table gather(ColumnView gatherMap)
gatherMap
- the map of indexes. Must be non-nullable and integral type.public Table gather(ColumnView gatherMap, OutOfBoundsPolicy outOfBoundsPolicy)
gatherMap
- the map of indexes. Must be non-nullable and integral type.outOfBoundsPolicy
- policy to use when an out-of-range value is in `gatherMap`.public Table scatter(ColumnView scatterMap, Table target)
scatterMap
- The map of indexes. Must be non-nullable and integral type.target
- The table into which rows from the current table are to be scattered out-of-place.public static Table scatter(Scalar[] source, ColumnView scatterMap, Table target)
source
- The input scalars containing values to be scattered into the target table.scatterMap
- The map of indexes. Must be non-nullable and integral type.target
- The table into which the values from source are to be scattered out-of-place.public GatherMap[] leftJoinGatherMaps(Table rightKeys, boolean compareNullsEqual)
GatherMap
instances will be returned that can be used to gather the left and right tables,
respectively, to produce the result of the left join.
It is the responsibility of the caller to close the resulting gather map instances.rightKeys
- join key columns from the right tablecompareNullsEqual
- true if null key values should match otherwise falsepublic GatherMap leftDistinctJoinGatherMap(Table rightKeys, boolean compareNullsEqual)
GatherMap
instance will be returned that can be used to gather the
right table and that result combined with the left table to produce a left outer join result.
It is the responsibility of the caller to close the resulting gather map instance.rightKeys
- join key columns from the right tablecompareNullsEqual
- true if null key values should match otherwise falsepublic long leftJoinRowCount(HashJoin rightHash)
HashJoin
argument has been constructed from the key columns from the right table.rightHash
- hash table built from join key columns from the right tablepublic GatherMap[] leftJoinGatherMaps(HashJoin rightHash)
HashJoin
argument has been constructed from the key columns from the right table.
Two GatherMap
instances will be returned that can be used to gather the left and right
tables, respectively, to produce the result of the left join.
It is the responsibility of the caller to close the resulting gather map instances.rightHash
- hash table built from join key columns from the right tablepublic GatherMap[] leftJoinGatherMaps(HashJoin rightHash, long outputRowCount)
HashJoin
argument has been constructed from the key columns from the right table.
Two GatherMap
instances will be returned that can be used to gather the left and right
tables, respectively, to produce the result of the left join.
It is the responsibility of the caller to close the resulting gather map instances.
This interface allows passing an output row count that was previously computed from
leftJoinRowCount(HashJoin)
.
WARNING: Passing a row count that is smaller than the actual row count will result
in undefined behavior.rightHash
- hash table built from join key columns from the right tableoutputRowCount
- number of output rows in the join resultpublic long conditionalLeftJoinRowCount(Table rightTable, CompiledExpression condition)
rightTable
- the right side table of the join in the joincondition
- conditional expression to evaluate during the joinpublic GatherMap[] conditionalLeftJoinGatherMaps(Table rightTable, CompiledExpression condition)
GatherMap
instances will be returned that can be used to gather
the left and right tables, respectively, to produce the result of the left join.
It is the responsibility of the caller to close the resulting gather map instances.rightTable
- the right side table of the join in the joincondition
- conditional expression to evaluate during the joinpublic GatherMap[] conditionalLeftJoinGatherMaps(Table rightTable, CompiledExpression condition, long outputRowCount)
GatherMap
instances will be returned that can be used to gather
the left and right tables, respectively, to produce the result of the left join.
It is the responsibility of the caller to close the resulting gather map instances.
This interface allows passing an output row count that was previously computed from
conditionalLeftJoinRowCount(Table, CompiledExpression)
.
WARNING: Passing a row count that is smaller than the actual row count will result
in undefined behavior.rightTable
- the right side table of the join in the joincondition
- conditional expression to evaluate during the joinoutputRowCount
- number of output rows in the join resultpublic static MixedJoinSize mixedLeftJoinSize(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
leftKeys
- the left table's key columns for the equality conditionrightKeys
- the right table's key columns for the equality conditionleftConditional
- the left table's columns needed to evaluate the inequality conditionrightConditional
- the right table's columns needed to evaluate the inequality conditioncondition
- the inequality condition of the joinnullEquality
- whether nulls should compare as equalpublic static GatherMap[] mixedLeftJoinGatherMaps(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
GatherMap
instances will be returned that can be used to gather
the left and right tables, respectively, to produce the result of the left join.
It is the responsibility of the caller to close the resulting gather map instances.leftKeys
- the left table's key columns for the equality conditionrightKeys
- the right table's key columns for the equality conditionleftConditional
- the left table's columns needed to evaluate the inequality conditionrightConditional
- the right table's columns needed to evaluate the inequality conditioncondition
- the inequality condition of the joinnullEquality
- whether nulls should compare as equalpublic static GatherMap[] mixedLeftJoinGatherMaps(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality, MixedJoinSize joinSize)
GatherMap
instances will be returned that can be used to gather
the left and right tables, respectively, to produce the result of the left join.
It is the responsibility of the caller to close the resulting gather map instances.
This interface allows passing the size result from
mixedLeftJoinSize(Table, Table, Table, Table, CompiledExpression, NullEquality)
when the output size was computed previously.leftKeys
- the left table's key columns for the equality conditionrightKeys
- the right table's key columns for the equality conditionleftConditional
- the left table's columns needed to evaluate the inequality conditionrightConditional
- the right table's columns needed to evaluate the inequality conditioncondition
- the inequality condition of the joinnullEquality
- whether nulls should compare as equaljoinSize
- mixed join size resultpublic GatherMap[] innerJoinGatherMaps(Table rightKeys, boolean compareNullsEqual)
GatherMap
instances will be returned that can be used to gather the left and right tables,
respectively, to produce the result of the inner join.
It is the responsibility of the caller to close the resulting gather map instances.rightKeys
- join key columns from the right tablecompareNullsEqual
- true if null key values should match otherwise falsepublic GatherMap[] innerDistinctJoinGatherMaps(Table rightKeys, boolean compareNullsEqual)
GatherMap
instances will be
returned that can be used to gather the left and right tables, respectively, to produce the
result of the inner join.
It is the responsibility of the caller to close the resulting gather map instances.rightKeys
- join key columns from the right tablecompareNullsEqual
- true if null key values should match otherwise falsepublic long innerJoinRowCount(HashJoin otherHash)
otherHash
- hash table built from join key columns from the other tablepublic GatherMap[] innerJoinGatherMaps(HashJoin rightHash)
HashJoin
argument has been constructed from the key columns from the right table.
Two GatherMap
instances will be returned that can be used to gather the left and right
tables, respectively, to produce the result of the inner join.
It is the responsibility of the caller to close the resulting gather map instances.rightHash
- hash table built from join key columns from the right tablepublic GatherMap[] innerJoinGatherMaps(HashJoin rightHash, long outputRowCount)
HashJoin
argument has been constructed from the key columns from the right table.
Two GatherMap
instances will be returned that can be used to gather the left and right
tables, respectively, to produce the result of the inner join.
It is the responsibility of the caller to close the resulting gather map instances.
This interface allows passing an output row count that was previously computed from
innerJoinRowCount(HashJoin)
.
WARNING: Passing a row count that is smaller than the actual row count will result
in undefined behavior.rightHash
- hash table built from join key columns from the right tableoutputRowCount
- number of output rows in the join resultpublic long conditionalInnerJoinRowCount(Table rightTable, CompiledExpression condition)
rightTable
- the right side table of the join in the joincondition
- conditional expression to evaluate during the joinpublic GatherMap[] conditionalInnerJoinGatherMaps(Table rightTable, CompiledExpression condition)
GatherMap
instances will be returned that can be used to gather
the left and right tables, respectively, to produce the result of the inner join.
It is the responsibility of the caller to close the resulting gather map instances.rightTable
- the right side table of the joincondition
- conditional expression to evaluate during the joinpublic GatherMap[] conditionalInnerJoinGatherMaps(Table rightTable, CompiledExpression condition, long outputRowCount)
GatherMap
instances will be returned that can be used to gather
the left and right tables, respectively, to produce the result of the inner join.
It is the responsibility of the caller to close the resulting gather map instances.
This interface allows passing an output row count that was previously computed from
conditionalInnerJoinRowCount(Table, CompiledExpression)
.
WARNING: Passing a row count that is smaller than the actual row count will result
in undefined behavior.rightTable
- the right side table of the join in the joincondition
- conditional expression to evaluate during the joinoutputRowCount
- number of output rows in the join resultpublic static MixedJoinSize mixedInnerJoinSize(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
leftKeys
- the left table's key columns for the equality conditionrightKeys
- the right table's key columns for the equality conditionleftConditional
- the left table's columns needed to evaluate the inequality conditionrightConditional
- the right table's columns needed to evaluate the inequality conditioncondition
- the inequality condition of the joinnullEquality
- whether nulls should compare as equalpublic static GatherMap[] mixedInnerJoinGatherMaps(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
GatherMap
instances will be returned that can be used to gather
the left and right tables, respectively, to produce the result of the inner join.
It is the responsibility of the caller to close the resulting gather map instances.leftKeys
- the left table's key columns for the equality conditionrightKeys
- the right table's key columns for the equality conditionleftConditional
- the left table's columns needed to evaluate the inequality conditionrightConditional
- the right table's columns needed to evaluate the inequality conditioncondition
- the inequality condition of the joinnullEquality
- whether nulls should compare as equalpublic static GatherMap[] mixedInnerJoinGatherMaps(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality, MixedJoinSize joinSize)
GatherMap
instances will be returned that can be used to gather
the left and right tables, respectively, to produce the result of the inner join.
It is the responsibility of the caller to close the resulting gather map instances.
This interface allows passing the size result from
mixedInnerJoinSize(Table, Table, Table, Table, CompiledExpression, NullEquality)
when the output size was computed previously.leftKeys
- the left table's key columns for the equality conditionrightKeys
- the right table's key columns for the equality conditionleftConditional
- the left table's columns needed to evaluate the inequality conditionrightConditional
- the right table's columns needed to evaluate the inequality conditioncondition
- the inequality condition of the joinnullEquality
- whether nulls should compare as equaljoinSize
- mixed join size resultpublic GatherMap[] fullJoinGatherMaps(Table rightKeys, boolean compareNullsEqual)
GatherMap
instances will be returned that can be used to gather the left and right tables,
respectively, to produce the result of the full join.
It is the responsibility of the caller to close the resulting gather map instances.rightKeys
- join key columns from the right tablecompareNullsEqual
- true if null key values should match otherwise falsepublic long fullJoinRowCount(HashJoin rightHash)
HashJoin
argument has been constructed from the key columns from the right table.
Note that unlike leftJoinRowCount(HashJoin)
and #innerJoinRowCount(HashJoin),
this will perform some redundant calculations compared to
{@link #fullJoinGatherMaps(HashJoin, long)}.rightHash
- hash table built from join key columns from the right tablepublic GatherMap[] fullJoinGatherMaps(HashJoin rightHash)
HashJoin
argument has been constructed from the key columns from the right table.
Two GatherMap
instances will be returned that can be used to gather the left and right
tables, respectively, to produce the result of the full join.
It is the responsibility of the caller to close the resulting gather map instances.rightHash
- hash table built from join key columns from the right tablepublic GatherMap[] fullJoinGatherMaps(HashJoin rightHash, long outputRowCount)
HashJoin
argument has been constructed from the key columns from the right table.
Two GatherMap
instances will be returned that can be used to gather the left and right
tables, respectively, to produce the result of the full join.
It is the responsibility of the caller to close the resulting gather map instances.
This interface allows passing an output row count that was previously computed from
fullJoinRowCount(HashJoin)
.
WARNING: Passing a row count that is smaller than the actual row count will result
in undefined behavior.rightHash
- hash table built from join key columns from the right tableoutputRowCount
- number of output rows in the join resultpublic GatherMap[] conditionalFullJoinGatherMaps(Table rightTable, CompiledExpression condition)
GatherMap
instances will be returned that can be used to gather
the left and right tables, respectively, to produce the result of the full join.
It is the responsibility of the caller to close the resulting gather map instances.rightTable
- the right side table of the joincondition
- conditional expression to evaluate during the joinpublic static GatherMap[] mixedFullJoinGatherMaps(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
GatherMap
instances will be returned that can be used to gather
the left and right tables, respectively, to produce the result of the full join.
It is the responsibility of the caller to close the resulting gather map instances.leftKeys
- the left table's key columns for the equality conditionrightKeys
- the right table's key columns for the equality conditionleftConditional
- the left table's columns needed to evaluate the inequality conditionrightConditional
- the right table's columns needed to evaluate the inequality conditioncondition
- the inequality condition of the joinnullEquality
- whether nulls should compare as equalpublic GatherMap leftSemiJoinGatherMap(Table rightKeys, boolean compareNullsEqual)
GatherMap
instance returned can be used to gather the left table to produce the result of the
left semi-join.
It is the responsibility of the caller to close the resulting gather map instance.rightKeys
- join key columns from the right tablecompareNullsEqual
- true if null key values should match otherwise falsepublic long conditionalLeftSemiJoinRowCount(Table rightTable, CompiledExpression condition)
rightTable
- the right side table of the join in the joincondition
- conditional expression to evaluate during the joinpublic GatherMap conditionalLeftSemiJoinGatherMap(Table rightTable, CompiledExpression condition)
GatherMap
instance returned can be used to gather the left table
to produce the result of the left semi join.
It is the responsibility of the caller to close the resulting gather map instance.rightTable
- the right side table of the joincondition
- conditional expression to evaluate during the joinpublic GatherMap conditionalLeftSemiJoinGatherMap(Table rightTable, CompiledExpression condition, long outputRowCount)
GatherMap
instance returned can be used to gather the left table
to produce the result of the left semi join.
It is the responsibility of the caller to close the resulting gather map instance.
This interface allows passing an output row count that was previously computed from
conditionalLeftSemiJoinRowCount(Table, CompiledExpression)
.
WARNING: Passing a row count that is smaller than the actual row count will result
in undefined behavior.rightTable
- the right side table of the joincondition
- conditional expression to evaluate during the joinoutputRowCount
- number of output rows in the join resultpublic static GatherMap mixedLeftSemiJoinGatherMap(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
GatherMap
instance will be returned that can be used to gather
the left table to produce the result of the left semi join.
It is the responsibility of the caller to close the resulting gather map instances.leftKeys
- the left table's key columns for the equality conditionrightKeys
- the right table's key columns for the equality conditionleftConditional
- the left table's columns needed to evaluate the inequality conditionrightConditional
- the right table's columns needed to evaluate the inequality conditioncondition
- the inequality condition of the joinnullEquality
- whether nulls should compare as equalpublic GatherMap leftAntiJoinGatherMap(Table rightKeys, boolean compareNullsEqual)
GatherMap
instance returned can be used to gather the left table to produce the result of the
left anti-join.
It is the responsibility of the caller to close the resulting gather map instance.rightKeys
- join key columns from the right tablecompareNullsEqual
- true if null key values should match otherwise falsepublic long conditionalLeftAntiJoinRowCount(Table rightTable, CompiledExpression condition)
rightTable
- the right side table of the join in the joincondition
- conditional expression to evaluate during the joinpublic GatherMap conditionalLeftAntiJoinGatherMap(Table rightTable, CompiledExpression condition)
GatherMap
instance returned can be used to gather the left table
to produce the result of the left anti join.
It is the responsibility of the caller to close the resulting gather map instance.rightTable
- the right side table of the joincondition
- conditional expression to evaluate during the joinpublic GatherMap conditionalLeftAntiJoinGatherMap(Table rightTable, CompiledExpression condition, long outputRowCount)
GatherMap
instance returned can be used to gather the left table
to produce the result of the left anti join.
It is the responsibility of the caller to close the resulting gather map instance.
This interface allows passing an output row count that was previously computed from
conditionalLeftAntiJoinRowCount(Table, CompiledExpression)
.
WARNING: Passing a row count that is smaller than the actual row count will result
in undefined behavior.rightTable
- the right side table of the joincondition
- conditional expression to evaluate during the joinoutputRowCount
- number of output rows in the join resultpublic static GatherMap mixedLeftAntiJoinGatherMap(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
GatherMap
instance will be returned that can be used to gather
the left table to produce the result of the left anti join.
It is the responsibility of the caller to close the resulting gather map instances.leftKeys
- the left table's key columns for the equality conditionrightKeys
- the right table's key columns for the equality conditionleftConditional
- the left table's columns needed to evaluate the inequality conditionrightConditional
- the right table's columns needed to evaluate the inequality conditioncondition
- the inequality condition of the joinnullEquality
- whether nulls should compare as equalpublic static Table fromPackedTable(ByteBuffer metadata, DeviceMemoryBuffer data)
metadata
- host-based metadata for the tabledata
- GPU data buffer for the tablepublic Table sample(long n, boolean replacement, long seed)
n
- non-negative number of samples expected from tablereplacement
- Allow or disallow sampling of the same row more than once.seed
- Seed value to initiate random number generator.Copyright © 2024. All rights reserved.