Package ai.rapids.cudf
Class Table
java.lang.Object
ai.rapids.cudf.Table
- All Implemented Interfaces:
AutoCloseable
Class to represent a collection of ColumnVectors and operations that can be performed on them
collectively.
The refcount on the columns will be increased once they are passed in
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enum
Enum to specify which of duplicate rows/elements will be copied to the output.static final class
Class representing groupby operationsstatic final class
static final class
Create a table on the GPU with data from the CPU. -
Constructor Summary
ConstructorsConstructorDescriptionTable
(long[] cudfColumns) Create a Table from an array of existing on device cudf::column pointers.Table
(ColumnVector... columns) Table class makes a copy of the array ofColumnVector
s passed to it. -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
static Table
concatenate
(Table... tables) Concatenate multiple tables together to form a single table.conditionalFullJoinGatherMaps
(Table rightTable, CompiledExpression condition) Computes the gather maps that can be used to manifest the result of a full join between two tables when a conditional expression is true.conditionalInnerJoinGatherMaps
(Table rightTable, CompiledExpression condition) Computes the gather maps that can be used to manifest the result of an inner join between two tables when a conditional expression is true.conditionalInnerJoinGatherMaps
(Table rightTable, CompiledExpression condition, long outputRowCount) Computes the gather maps that can be used to manifest the result of an inner join between two tables when a conditional expression is true.long
conditionalInnerJoinRowCount
(Table rightTable, CompiledExpression condition) Computes the number of rows from the result of an inner join between two tables when a conditional expression is true.conditionalLeftAntiJoinGatherMap
(Table rightTable, CompiledExpression condition) Computes the gather map that can be used to manifest the result of a left anti join between two tables when a conditional expression is true.conditionalLeftAntiJoinGatherMap
(Table rightTable, CompiledExpression condition, long outputRowCount) Computes the gather map that can be used to manifest the result of a left anti join between two tables when a conditional expression is true.long
conditionalLeftAntiJoinRowCount
(Table rightTable, CompiledExpression condition) Computes the number of rows from the result of a left anti join between two tables when a conditional expression is true.conditionalLeftJoinGatherMaps
(Table rightTable, CompiledExpression condition) Computes the gather maps that can be used to manifest the result of a left join between two tables when a conditional expression is true.conditionalLeftJoinGatherMaps
(Table rightTable, CompiledExpression condition, long outputRowCount) Computes the gather maps that can be used to manifest the result of a left join between two tables when a conditional expression is true.long
conditionalLeftJoinRowCount
(Table rightTable, CompiledExpression condition) Computes the number of rows from the result of a left join between two tables when a conditional expression is true.conditionalLeftSemiJoinGatherMap
(Table rightTable, CompiledExpression condition) Computes the gather map that can be used to manifest the result of a left semi join between two tables when a conditional expression is true.conditionalLeftSemiJoinGatherMap
(Table rightTable, CompiledExpression condition, long outputRowCount) Computes the gather map that can be used to manifest the result of a left semi join between two tables when a conditional expression is true.long
conditionalLeftSemiJoinRowCount
(Table rightTable, CompiledExpression condition) Computes the number of rows from the result of a left semi join between two tables when a conditional expression is true.contiguousSplit
(int... indices) Split a table at given boundaries, but the result of each split has memory that is laid out in a contiguous range of memory.Joins two tables all of the left against all of the right.int
Count how many rows in the table are distinct from one another.int
distinctCount
(NullEquality nullsEqual) Count how many rows in the table are distinct from one another.dropDuplicates
(int[] keyColumns, Table.DuplicateKeepOption keep, boolean nullsEqual) Copy rows of the current table to an output table such that duplicate rows in the key columns are ignored (i.e., only one row from the duplicate ones will be copied).explode
(int index) Explodes a list column's elements.explodeOuter
(int index) Explodes a list column's elements.explodeOuterPosition
(int index) Explodes a list column's elements retaining any null entries or empty lists and includes a position column.explodePosition
(int index) Explodes a list column's elements and includes a position column.filter
(ColumnView mask) Filters this table using a column of boolean values as a mask, returning a new one.static Table
fromPackedTable
(ByteBuffer metadata, DeviceMemoryBuffer data) Construct a table from a packed representation.fullJoinGatherMaps
(HashJoin rightHash) Computes the gather maps that can be used to manifest the result of a full equi-join between two tables.fullJoinGatherMaps
(HashJoin rightHash, long outputRowCount) Computes the gather maps that can be used to manifest the result of a full equi-join between two tables.fullJoinGatherMaps
(Table rightKeys, boolean compareNullsEqual) Computes the gather maps that can be used to manifest the result of an full equi-join between two tables.long
fullJoinRowCount
(HashJoin rightHash) Computes the number of rows resulting from a full equi-join between two tables.gather
(ColumnView gatherMap) Gathers the rows of this table according to `gatherMap` such that row "i" in the resulting table's columns will contain row "gatherMap[i]" from this table.gather
(ColumnView gatherMap, OutOfBoundsPolicy outOfBoundsPolicy) Gathers the rows of this table according to `gatherMap` such that row "i" in the resulting table's columns will contain row "gatherMap[i]" from this table.getColumn
(int index) Return theColumnVector
at the specified index.static TableWriter
getCSVBufferWriter
(CSVWriterOptions options, HostBufferConsumer bufferConsumer) static TableWriter
getCSVBufferWriter
(CSVWriterOptions options, HostBufferConsumer bufferConsumer, HostMemoryAllocator hostMemoryAllocator) long
Returns the Device memory buffer size.long
Return the native table view handle for this tablefinal int
final long
groupBy
(int... indices) Returns aggregate operations grouped by columns provided in indices with default options as below: - null is considered as key while grouping.groupBy
(GroupByOptions groupByOptions, int... indices) Returns aggregate operations grouped by columns provided in indicesinnerDistinctJoinGatherMaps
(Table rightKeys, boolean compareNullsEqual) Computes the gather maps that can be used to manifest the result of an inner equi-join between two tables where the right table is guaranteed to not contain any duplicated join keys.innerJoinGatherMaps
(HashJoin rightHash) Computes the gather maps that can be used to manifest the result of an inner equi-join between two tables.innerJoinGatherMaps
(HashJoin rightHash, long outputRowCount) Computes the gather maps that can be used to manifest the result of an inner equi-join between two tables.innerJoinGatherMaps
(Table rightKeys, boolean compareNullsEqual) Computes the gather maps that can be used to manifest the result of an inner equi-join between two tables.long
innerJoinRowCount
(HashJoin otherHash) Computes the number of rows resulting from an inner equi-join between two tables.Interleave all columns into a single column.leftAntiJoinGatherMap
(Table rightKeys, boolean compareNullsEqual) Computes the gather map that can be used to manifest the result of a left anti-join between two tables.leftDistinctJoinGatherMap
(Table rightKeys, boolean compareNullsEqual) Computes a gather map that can be used to manifest the result of a left equi-join between two tables where the right table is guaranteed to not contain any duplicated join keys.leftJoinGatherMaps
(HashJoin rightHash) Computes the gather maps that can be used to manifest the result of a left equi-join between two tables.leftJoinGatherMaps
(HashJoin rightHash, long outputRowCount) Computes the gather maps that can be used to manifest the result of a left equi-join between two tables.leftJoinGatherMaps
(Table rightKeys, boolean compareNullsEqual) Computes the gather maps that can be used to manifest the result of a left equi-join between two tables.long
leftJoinRowCount
(HashJoin rightHash) Computes the number of rows resulting from a left equi-join between two tables.leftSemiJoinGatherMap
(Table rightKeys, boolean compareNullsEqual) Computes the gather map that can be used to manifest the result of a left semi-join between two tables.lowerBound
(boolean[] areNullsSmallest, Table valueTable, boolean[] descFlags) Find smallest indices in a sorted table where values should be inserted to maintain order.lowerBound
(Table valueTable, OrderByArg... args) Find smallest indices in a sorted table where values should be inserted to maintain order.makeChunkedPack
(long bounceBufferSize) Create an instance of `ChunkedPack` which can be used to pack this table contiguously in memory utilizing a bounce buffer of size `bounceBufferSize`.makeChunkedPack
(long bounceBufferSize, RmmDeviceMemoryResource tempMemoryResource) Create an instance of `ChunkedPack` which can be used to pack this table contiguously in memory utilizing a bounce buffer of size `bounceBufferSize`.static Table
merge
(Table[] tables, OrderByArg... args) Merge multiple already sorted tables keeping the sort order the same.static Table
merge
(List<Table> tables, OrderByArg... args) Merge multiple already sorted tables keeping the sort order the same.static GatherMap[]
mixedFullJoinGatherMaps
(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality) Computes the gather maps that can be used to manifest the result of a full join between two tables using a mix of equality and inequality conditions.static GatherMap[]
mixedInnerJoinGatherMaps
(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality) Computes the gather maps that can be used to manifest the result of an inner join between two tables using a mix of equality and inequality conditions.static GatherMap[]
mixedInnerJoinGatherMaps
(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality, MixedJoinSize joinSize) Computes the gather maps that can be used to manifest the result of an inner join between two tables using a mix of equality and inequality conditions.static MixedJoinSize
mixedInnerJoinSize
(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality) Computes output size information for an inner join between two tables using a mix of equality and inequality conditions.static GatherMap
mixedLeftAntiJoinGatherMap
(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality) Computes the gather map that can be used to manifest the result of a left anti join between two tables using a mix of equality and inequality conditions.static GatherMap[]
mixedLeftJoinGatherMaps
(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality) Computes the gather maps that can be used to manifest the result of a left join between two tables using a mix of equality and inequality conditions.static GatherMap[]
mixedLeftJoinGatherMaps
(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality, MixedJoinSize joinSize) Computes the gather maps that can be used to manifest the result of a left join between two tables using a mix of equality and inequality conditions.static MixedJoinSize
mixedLeftJoinSize
(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality) Computes output size information for a left join between two tables using a mix of equality and inequality conditions.static GatherMap
mixedLeftSemiJoinGatherMap
(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality) Computes the gather map that can be used to manifest the result of a left semi join between two tables using a mix of equality and inequality conditions.onColumns
(int... indices) orderBy
(OrderByArg... args) Orders the table using the sortkeys returning a new allocated table.partition
(ColumnView partitionMap, int numberOfPartitions) Partition this table using the mapping in partitionMap.static TableWithMeta
readAndInferJSON
(JSONOptions opts, DataSource ds) Read JSON formatted data and infer the column names and schema.static StreamedTableReader
readArrowIPCChunked
(ArrowIPCOptions options, HostBufferProvider provider) static StreamedTableReader
readArrowIPCChunked
(ArrowIPCOptions options, HostBufferProvider provider, HostMemoryAllocator hostMemoryAllocator) Get a reader that will return tables.static StreamedTableReader
readArrowIPCChunked
(ArrowIPCOptions options, File inputFile) Get a reader that will return tables.static StreamedTableReader
readArrowIPCChunked
(HostBufferProvider provider) Get a reader that will return tables.static StreamedTableReader
readArrowIPCChunked
(File inputFile) Get a reader that will return tables.static Table
readAvro
(byte[] buffer) Read Avro formatted data.static Table
readAvro
(AvroOptions opts, byte[] buffer) Read Avro formatted data.static Table
readAvro
(AvroOptions opts, byte[] buffer, long offset, long len) static Table
readAvro
(AvroOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator) Read Avro formatted data.static Table
readAvro
(AvroOptions opts, DataSource ds) static Table
readAvro
(AvroOptions opts, HostMemoryBuffer buffer, long offset, long len) Read Avro formatted data.static Table
readAvro
(AvroOptions opts, File path) Read an Avro file.static Table
Read an Avro file using the default AvroOptions.static Table
Read CSV formatted data using the default CSVOptions.static Table
readCSV
(Schema schema, CSVOptions opts, byte[] buffer) Read CSV formatted data.static Table
readCSV
(Schema schema, CSVOptions opts, byte[] buffer, long offset, long len) static Table
readCSV
(Schema schema, CSVOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator) Read CSV formatted data.static Table
readCSV
(Schema schema, CSVOptions opts, DataSource ds) static Table
readCSV
(Schema schema, CSVOptions opts, HostMemoryBuffer buffer, long offset, long len) Read CSV formatted data.static Table
readCSV
(Schema schema, CSVOptions opts, File path) Read a CSV file.static Table
Read a CSV file using the default CSVOptions.static TableWithMeta
readJSON
(JSONOptions opts, HostMemoryBuffer buffer, long offset, long len) Read JSON formatted data and infer the column names and schema.static Table
Read JSON formatted data using the default JSONOptions.static Table
readJSON
(Schema schema, JSONOptions opts, byte[] buffer) Read JSON formatted data.static Table
readJSON
(Schema schema, JSONOptions opts, byte[] buffer, long offset, long len) static Table
readJSON
(Schema schema, JSONOptions opts, byte[] buffer, long offset, long len, int emptyRowCount) static Table
readJSON
(Schema schema, JSONOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator) Read JSON formatted data.static Table
readJSON
(Schema schema, JSONOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator, int emptyRowCount) Deprecated.This method is deprecated since emptyRowCount is not used.static Table
readJSON
(Schema schema, JSONOptions opts, DataSource ds) Read JSON formatted data.static Table
readJSON
(Schema schema, JSONOptions opts, DataSource ds, int emptyRowCount) Deprecated.This method is deprecated since emptyRowCount is not used.static Table
readJSON
(Schema schema, JSONOptions opts, HostMemoryBuffer buffer, long offset, long len) Read JSON formatted data.static Table
readJSON
(Schema schema, JSONOptions opts, HostMemoryBuffer buffer, long offset, long len, int emptyRowCount) Deprecated.This method is deprecated since emptyRowCount is not used.static Table
readJSON
(Schema schema, JSONOptions opts, File path) Read a JSON file.static Table
Read a JSON file using the default JSONOptions.static Table
readORC
(byte[] buffer) Read ORC formatted data.static Table
readORC
(ORCOptions opts, byte[] buffer) Read ORC formatted data.static Table
readORC
(ORCOptions opts, byte[] buffer, long offset, long len) static Table
readORC
(ORCOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator) Read ORC formatted data.static Table
readORC
(ORCOptions opts, DataSource ds) static Table
readORC
(ORCOptions opts, HostMemoryBuffer buffer, long offset, long len) Read ORC formatted data.static Table
readORC
(ORCOptions opts, File path) Read a ORC file.static Table
Read a ORC file using the default ORCOptions.static Table
readParquet
(byte[] buffer) Read parquet formatted data.static Table
readParquet
(ParquetOptions opts, byte[] buffer) Read parquet formatted data.static Table
readParquet
(ParquetOptions opts, byte[] buffer, long offset, long len) Read parquet formatted data.static Table
readParquet
(ParquetOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator) Read parquet formatted data.static Table
readParquet
(ParquetOptions opts, DataSource ds) Read parquet formatted data.static Table
readParquet
(ParquetOptions opts, HostMemoryBuffer... buffers) Read parquet formatted data.static Table
readParquet
(ParquetOptions opts, HostMemoryBuffer buffer, long offset, long len) Read parquet formatted data.static Table
readParquet
(ParquetOptions opts, File path) Read a Parquet file.static Table
readParquet
(File path) Read a Parquet file using the default ParquetOptions.repeat
(int count) Repeat each row of this table count times.repeat
(ColumnView counts) Create a new table by repeating each row of this table.roundRobinPartition
(int numberOfPartitions, int startPartition) Round-robin partition a table into the specified number of partitions.Returns an approximate cumulative size in bits of all columns in the `table_view` for each row.sample
(long n, boolean replacement, long seed) Gather `n` samples from table randomly Note: does not preserve the ordering Example: input: {col1: {1, 2, 3, 4, 5}, col2: {6, 7, 8, 9, 10}} n: 3 replacement: false output: {col1: {3, 1, 4}, col2: {8, 6, 9}} replacement: true output: {col1: {3, 1, 1}, col2: {8, 6, 6}} throws "logic_error" if `n` > table rows and `replacement` == FALSE.scatter
(ColumnView scatterMap, Table target) Scatters values from the source table into the target table out-of-place, returning a new result table.static Table
scatter
(Scalar[] source, ColumnView scatterMap, Table target) Scatters values from the source rows into the target table out-of-place, returning a new result table.sortOrder
(OrderByArg... args) Get back a gather map that can be used to sort the data.toString()
upperBound
(boolean[] areNullsSmallest, Table valueTable, boolean[] descFlags) Find largest indices in a sorted table where values should be inserted to maintain order.upperBound
(Table valueTable, OrderByArg... args) Find largest indices in a sorted table where values should be inserted to maintain order.static TableWriter
writeArrowIPCChunked
(ArrowIPCWriterOptions options, HostBufferConsumer consumer) static TableWriter
writeArrowIPCChunked
(ArrowIPCWriterOptions options, HostBufferConsumer consumer, HostMemoryAllocator hostMemoryAllocator) Get a table writer to write arrow IPC data and handle each chunk with a callback.static TableWriter
writeArrowIPCChunked
(ArrowIPCWriterOptions options, File outputFile) Get a table writer to write arrow IPC data to a file.static void
writeColumnViewsToParquet
(ParquetWriterOptions options, HostBufferConsumer consumer, ColumnView... columnViews) static void
writeColumnViewsToParquet
(ParquetWriterOptions options, HostBufferConsumer consumer, HostMemoryAllocator hostMemoryAllocator, ColumnView... columnViews) This is an evolving API and most likely be removed in future releases.void
writeCSVToFile
(CSVWriterOptions options, String outputPath) static TableWriter
writeORCChunked
(ORCWriterOptions options, HostBufferConsumer consumer) static TableWriter
writeORCChunked
(ORCWriterOptions options, HostBufferConsumer consumer, HostMemoryAllocator hostMemoryAllocator) Get a table writer to write ORC data and handle each chunk with a callback.static TableWriter
writeORCChunked
(ORCWriterOptions options, File outputFile) Get a table writer to write ORC data to a file.static TableWriter
writeParquetChunked
(ParquetWriterOptions options, HostBufferConsumer consumer) static TableWriter
writeParquetChunked
(ParquetWriterOptions options, HostBufferConsumer consumer, HostMemoryAllocator hostMemoryAllocator) Get a table writer to write parquet data and handle each chunk with a callback.static TableWriter
writeParquetChunked
(ParquetWriterOptions options, File outputFile) Get a table writer to write parquet data to a file.
-
Constructor Details
-
Table
Table class makes a copy of the array ofColumnVector
s passed to it. The class will decrease the refcount on itself and all its contents when closed and free resources if refcount is zero- Parameters:
columns
- - Array of ColumnVectors
-
Table
public Table(long[] cudfColumns) Create a Table from an array of existing on device cudf::column pointers. Ownership of the columns is transferred to the ColumnVectors held by the new Table. In the case of an exception the columns will be deleted.- Parameters:
cudfColumns
- - Array of nativeHandles
-
-
Method Details
-
getNativeView
public long getNativeView()Return the native table view handle for this table -
getColumn
Return theColumnVector
at the specified index. If you want to keep a reference to the column around past the life time of the table, you will need to increment the reference count on the column yourself. -
getRowCount
public final long getRowCount() -
getNumberOfColumns
public final int getNumberOfColumns() -
close
public void close()- Specified by:
close
in interfaceAutoCloseable
-
toString
-
getDeviceMemorySize
public long getDeviceMemorySize()Returns the Device memory buffer size. -
readCSV
Read a CSV file using the default CSVOptions.- Parameters:
schema
- the schema of the file. You may use Schema.INFERRED to infer the schema.path
- the local file to read.- Returns:
- the file parsed as a table on the GPU.
-
readCSV
Read a CSV file.- Parameters:
schema
- the schema of the file. You may use Schema.INFERRED to infer the schema.opts
- various CSV parsing options.path
- the local file to read.- Returns:
- the file parsed as a table on the GPU.
-
readCSV
Read CSV formatted data using the default CSVOptions.- Parameters:
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.buffer
- raw UTF8 formatted bytes.- Returns:
- the data parsed as a table on the GPU.
-
readCSV
Read CSV formatted data.- Parameters:
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.opts
- various CSV parsing options.buffer
- raw UTF8 formatted bytes.- Returns:
- the data parsed as a table on the GPU.
-
readCSV
public static Table readCSV(Schema schema, CSVOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator) Read CSV formatted data.- Parameters:
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.opts
- various CSV parsing options.buffer
- raw UTF8 formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.hostMemoryAllocator
- allocator for host memory buffers- Returns:
- the data parsed as a table on the GPU.
-
readCSV
-
readCSV
public static Table readCSV(Schema schema, CSVOptions opts, HostMemoryBuffer buffer, long offset, long len) Read CSV formatted data.- Parameters:
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.opts
- various CSV parsing options.buffer
- raw UTF8 formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.- Returns:
- the data parsed as a table on the GPU.
-
readCSV
-
writeCSVToFile
-
getCSVBufferWriter
public static TableWriter getCSVBufferWriter(CSVWriterOptions options, HostBufferConsumer bufferConsumer, HostMemoryAllocator hostMemoryAllocator) -
getCSVBufferWriter
public static TableWriter getCSVBufferWriter(CSVWriterOptions options, HostBufferConsumer bufferConsumer) -
readJSON
Read a JSON file using the default JSONOptions.- Parameters:
schema
- the schema of the file. You may use Schema.INFERRED to infer the schema.path
- the local file to read.- Returns:
- the file parsed as a table on the GPU.
-
readJSON
Read JSON formatted data using the default JSONOptions.- Parameters:
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.buffer
- raw UTF8 formatted bytes.- Returns:
- the data parsed as a table on the GPU.
-
readJSON
Read JSON formatted data.- Parameters:
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.opts
- various JSON parsing options.buffer
- raw UTF8 formatted bytes.- Returns:
- the data parsed as a table on the GPU.
-
readJSON
Read a JSON file.- Parameters:
schema
- the schema of the file. You may use Schema.INFERRED to infer the schema.opts
- various JSON parsing options.path
- the local file to read.- Returns:
- the file parsed as a table on the GPU.
-
readJSON
public static Table readJSON(Schema schema, JSONOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator) Read JSON formatted data.- Parameters:
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.opts
- various JSON parsing options.buffer
- raw UTF8 formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.hostMemoryAllocator
- allocator for host memory buffers- Returns:
- the data parsed as a table on the GPU.
-
readJSON
public static Table readJSON(Schema schema, JSONOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator, int emptyRowCount) Deprecated.This method is deprecated since emptyRowCount is not used. Use the method without emptyRowCount instead.Read JSON formatted data.- Parameters:
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.opts
- various JSON parsing options.buffer
- raw UTF8 formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.hostMemoryAllocator
- allocator for host memory buffersemptyRowCount
- the number of rows to return if no columns were read.- Returns:
- the data parsed as a table on the GPU.
-
readJSON
public static Table readJSON(Schema schema, JSONOptions opts, byte[] buffer, long offset, long len, int emptyRowCount) -
readJSON
-
readJSON
public static TableWithMeta readJSON(JSONOptions opts, HostMemoryBuffer buffer, long offset, long len) Read JSON formatted data and infer the column names and schema.- Parameters:
opts
- various JSON parsing options.buffer
- raw UTF8 formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.- Returns:
- the data parsed as a table on the GPU and the metadata for the table returned.
-
readAndInferJSON
Read JSON formatted data and infer the column names and schema.- Parameters:
opts
- various JSON parsing options.- Returns:
- the data parsed as a table on the GPU and the metadata for the table returned.
-
readJSON
public static Table readJSON(Schema schema, JSONOptions opts, HostMemoryBuffer buffer, long offset, long len) Read JSON formatted data.- Parameters:
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.opts
- various JSON parsing options.buffer
- raw UTF8 formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.- Returns:
- the data parsed as a table on the GPU.
-
readJSON
public static Table readJSON(Schema schema, JSONOptions opts, HostMemoryBuffer buffer, long offset, long len, int emptyRowCount) Deprecated.This method is deprecated since emptyRowCount is not used. Use the method without emptyRowCount instead.Read JSON formatted data.- Parameters:
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.opts
- various JSON parsing options.buffer
- raw UTF8 formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.emptyRowCount
- the number of rows to use if no columns were found.- Returns:
- the data parsed as a table on the GPU.
-
readJSON
Read JSON formatted data.- Parameters:
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.opts
- various JSON parsing options.ds
- the DataSource to read from.- Returns:
- the data parsed as a table on the GPU.
-
readJSON
Deprecated.This method is deprecated since emptyRowCount is not used. Use the method without emptyRowCount instead.Read JSON formatted data.- Parameters:
schema
- the schema of the data. You may use Schema.INFERRED to infer the schema.opts
- various JSON parsing options.ds
- the DataSource to read from.emptyRowCount
- the number of rows to return if no columns were read.- Returns:
- the data parsed as a table on the GPU.
-
readParquet
Read a Parquet file using the default ParquetOptions.- Parameters:
path
- the local file to read.- Returns:
- the file parsed as a table on the GPU.
-
readParquet
Read a Parquet file.- Parameters:
opts
- various parquet parsing options.path
- the local file to read.- Returns:
- the file parsed as a table on the GPU.
-
readParquet
Read parquet formatted data.- Parameters:
buffer
- raw parquet formatted bytes.- Returns:
- the data parsed as a table on the GPU.
-
readParquet
Read parquet formatted data.- Parameters:
opts
- various parquet parsing options.buffer
- raw parquet formatted bytes.- Returns:
- the data parsed as a table on the GPU.
-
readParquet
public static Table readParquet(ParquetOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator) Read parquet formatted data.- Parameters:
opts
- various parquet parsing options.buffer
- raw parquet formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.hostMemoryAllocator
- allocator for host memory buffers- Returns:
- the data parsed as a table on the GPU.
-
readParquet
Read parquet formatted data.- Parameters:
opts
- various parquet parsing options.buffer
- raw parquet formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.- Returns:
- the data parsed as a table on the GPU.
-
readParquet
public static Table readParquet(ParquetOptions opts, HostMemoryBuffer buffer, long offset, long len) Read parquet formatted data.- Parameters:
opts
- various parquet parsing options.buffer
- raw parquet formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.- Returns:
- the data parsed as a table on the GPU.
-
readParquet
Read parquet formatted data.- Parameters:
opts
- various parquet parsing options.buffers
- Buffers containing the Parquet data. The buffers are logically concatenated in order to construct the file being read.- Returns:
- the data parsed as a table on the GPU.
-
readParquet
Read parquet formatted data.- Parameters:
opts
- various parquet parsing options.ds
- custom datasource to provide the Parquet file data- Returns:
- the data parsed as a table on the GPU.
-
readAvro
Read an Avro file using the default AvroOptions.- Parameters:
path
- the local file to read.- Returns:
- the file parsed as a table on the GPU.
-
readAvro
Read an Avro file.- Parameters:
opts
- various Avro parsing options.path
- the local file to read.- Returns:
- the file parsed as a table on the GPU.
-
readAvro
Read Avro formatted data.- Parameters:
buffer
- raw Avro formatted bytes.- Returns:
- the data parsed as a table on the GPU.
-
readAvro
Read Avro formatted data.- Parameters:
opts
- various Avro parsing options.buffer
- raw Avro formatted bytes.- Returns:
- the data parsed as a table on the GPU.
-
readAvro
public static Table readAvro(AvroOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator) Read Avro formatted data.- Parameters:
opts
- various Avro parsing options.buffer
- raw Avro formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.hostMemoryAllocator
- allocator for host memory buffers- Returns:
- the data parsed as a table on the GPU.
-
readAvro
-
readAvro
Read Avro formatted data.- Parameters:
opts
- various Avro parsing options.buffer
- raw Avro formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.- Returns:
- the data parsed as a table on the GPU.
-
readAvro
-
readORC
Read a ORC file using the default ORCOptions.- Parameters:
path
- the local file to read.- Returns:
- the file parsed as a table on the GPU.
-
readORC
Read a ORC file.- Parameters:
opts
- ORC parsing options.path
- the local file to read.- Returns:
- the file parsed as a table on the GPU.
-
readORC
Read ORC formatted data.- Parameters:
buffer
- raw ORC formatted bytes.- Returns:
- the data parsed as a table on the GPU.
-
readORC
Read ORC formatted data.- Parameters:
opts
- various ORC parsing options.buffer
- raw ORC formatted bytes.- Returns:
- the data parsed as a table on the GPU.
-
readORC
public static Table readORC(ORCOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator) Read ORC formatted data.- Parameters:
opts
- various ORC parsing options.buffer
- raw ORC formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.hostMemoryAllocator
- allocator for host memory buffers- Returns:
- the data parsed as a table on the GPU.
-
readORC
-
readORC
Read ORC formatted data.- Parameters:
opts
- various ORC parsing options.buffer
- raw ORC formatted bytes.offset
- the starting offset into buffer.len
- the number of bytes to parse.- Returns:
- the data parsed as a table on the GPU.
-
readORC
-
writeParquetChunked
Get a table writer to write parquet data to a file.- Parameters:
options
- the parquet writer options.outputFile
- where to write the file.- Returns:
- a table writer to use for writing out multiple tables.
-
writeParquetChunked
public static TableWriter writeParquetChunked(ParquetWriterOptions options, HostBufferConsumer consumer, HostMemoryAllocator hostMemoryAllocator) Get a table writer to write parquet data and handle each chunk with a callback.- Parameters:
options
- the parquet writer options.consumer
- a class that will be called when host buffers are ready with parquet formatted data in them.hostMemoryAllocator
- allocator for host memory buffers- Returns:
- a table writer to use for writing out multiple tables.
-
writeParquetChunked
public static TableWriter writeParquetChunked(ParquetWriterOptions options, HostBufferConsumer consumer) -
writeColumnViewsToParquet
public static void writeColumnViewsToParquet(ParquetWriterOptions options, HostBufferConsumer consumer, HostMemoryAllocator hostMemoryAllocator, ColumnView... columnViews) This is an evolving API and most likely be removed in future releases. Please use with the caveat that this will not exist in the near future.- Parameters:
options
- the Parquet writer options.consumer
- a class that will be called when host buffers are ready with Parquet formatted data in them.hostMemoryAllocator
- allocator for host memory bufferscolumnViews
- ColumnViews to write to Parquet
-
writeColumnViewsToParquet
public static void writeColumnViewsToParquet(ParquetWriterOptions options, HostBufferConsumer consumer, ColumnView... columnViews) -
writeORCChunked
Get a table writer to write ORC data to a file.- Parameters:
options
- the ORC writer options.outputFile
- where to write the file.- Returns:
- a table writer to use for writing out multiple tables.
-
writeORCChunked
public static TableWriter writeORCChunked(ORCWriterOptions options, HostBufferConsumer consumer, HostMemoryAllocator hostMemoryAllocator) Get a table writer to write ORC data and handle each chunk with a callback.- Parameters:
options
- the ORC writer options.consumer
- a class that will be called when host buffers are ready with ORC formatted data in them.hostMemoryAllocator
- allocator for host memory buffers- Returns:
- a table writer to use for writing out multiple tables.
-
writeORCChunked
-
writeArrowIPCChunked
Get a table writer to write arrow IPC data to a file.- Parameters:
options
- the arrow IPC writer options.outputFile
- where to write the file.- Returns:
- a table writer to use for writing out multiple tables.
-
writeArrowIPCChunked
public static TableWriter writeArrowIPCChunked(ArrowIPCWriterOptions options, HostBufferConsumer consumer, HostMemoryAllocator hostMemoryAllocator) Get a table writer to write arrow IPC data and handle each chunk with a callback.- Parameters:
options
- the arrow IPC writer options.consumer
- a class that will be called when host buffers are ready with arrow IPC formatted data in them.hostMemoryAllocator
- allocator for host memory buffers- Returns:
- a table writer to use for writing out multiple tables.
-
writeArrowIPCChunked
public static TableWriter writeArrowIPCChunked(ArrowIPCWriterOptions options, HostBufferConsumer consumer) -
readArrowIPCChunked
Get a reader that will return tables.- Parameters:
options
- options for reading.inputFile
- the file to read the Arrow IPC formatted data from- Returns:
- a reader.
-
readArrowIPCChunked
Get a reader that will return tables.- Parameters:
inputFile
- the file to read the Arrow IPC formatted data from- Returns:
- a reader.
-
readArrowIPCChunked
public static StreamedTableReader readArrowIPCChunked(ArrowIPCOptions options, HostBufferProvider provider, HostMemoryAllocator hostMemoryAllocator) Get a reader that will return tables.- Parameters:
options
- options for reading.provider
- what will provide the data being read.- Returns:
- a reader.
-
readArrowIPCChunked
public static StreamedTableReader readArrowIPCChunked(ArrowIPCOptions options, HostBufferProvider provider) -
readArrowIPCChunked
Get a reader that will return tables.- Parameters:
provider
- what will provide the data being read.- Returns:
- a reader.
-
concatenate
Concatenate multiple tables together to form a single table. The schema of each table (i.e.: number of columns and types of each column) must be equal across all tables and will determine the schema of the resulting table. -
interleaveColumns
Interleave all columns into a single column. Columns must all have the same data type and length. Example: ``` input = [[A1, A2, A3], [B1, B2, B3]] return = [A1, B1, A2, B2, A3, B3] ```- Returns:
- The interleaved columns as a single column
-
repeat
Repeat each row of this table count times.- Parameters:
count
- the number of times to repeat each row.- Returns:
- the new Table.
-
repeat
Create a new table by repeating each row of this table. The number of repetitions of each row is defined by the corresponding value in counts.- Parameters:
counts
- the number of times to repeat each row. Cannot have nulls, must be an Integer type, and must have one entry for each row in the table.- Returns:
- the new Table.
- Throws:
CudfException
- on any error.
-
partition
Partition this table using the mapping in partitionMap. partitionMap must be an integer column. The number of rows in partitionMap must be the same as this table. Each row in the map will indicate which partition the rows in the table belong to.- Parameters:
partitionMap
- the partitions for each row.numberOfPartitions
- number of partitions- Returns:
PartitionedTable
Table that exposes a limited functionality of theTable
class
-
lowerBound
Find smallest indices in a sorted table where values should be inserted to maintain order.Example: Single column: idx 0 1 2 3 4 inputTable = { 10, 20, 20, 30, 50 } valuesTable = { 20 } result = { 1 } Multi Column: idx 0 1 2 3 4 inputTable = {{ 10, 20, 20, 20, 20 }, { 5.0, .5, .5, .7, .7 }, { 90, 77, 78, 61, 61 }} valuesTable = {{ 20 }, { .7 }, { 61 }} result = { 3 }
The input table and the values table need to be non-empty (row count > 0)- Parameters:
areNullsSmallest
- per column, true if nulls are assumed smallestvalueTable
- the table of values to find insertion locations fordescFlags
- per column indicates the ordering, true if descending.- Returns:
- ColumnVector with lower bound indices for all rows in valueTable
-
lowerBound
Find smallest indices in a sorted table where values should be inserted to maintain order. This is a convenience method. It pulls out the columns indicated by the args and sets up the ordering properly to call `lowerBound`.- Parameters:
valueTable
- the table of values to find insertion locations forargs
- the sort order used to sort this table.- Returns:
- ColumnVector with lower bound indices for all rows in valueTable
-
upperBound
Find largest indices in a sorted table where values should be inserted to maintain order. Given a sorted table return the upper bound.Example: Single column: idx 0 1 2 3 4 inputTable = { 10, 20, 20, 30, 50 } valuesTable = { 20 } result = { 3 } Multi Column: idx 0 1 2 3 4 inputTable = {{ 10, 20, 20, 20, 20 }, { 5.0, .5, .5, .7, .7 }, { 90, 77, 78, 61, 61 }} valuesTable = {{ 20 }, { .7 }, { 61 }} result = { 5 }
The input table and the values table need to be non-empty (row count > 0)- Parameters:
areNullsSmallest
- per column, true if nulls are assumed smallestvalueTable
- the table of values to find insertion locations fordescFlags
- per column indicates the ordering, true if descending.- Returns:
- ColumnVector with upper bound indices for all rows in valueTable
-
upperBound
Find largest indices in a sorted table where values should be inserted to maintain order. This is a convenience method. It pulls out the columns indicated by the args and sets up the ordering properly to call `upperBound`.- Parameters:
valueTable
- the table of values to find insertion locations forargs
- the sort order used to sort this table.- Returns:
- ColumnVector with upper bound indices for all rows in valueTable
-
crossJoin
Joins two tables all of the left against all of the right. Be careful as this gets very big and you can easily use up all of the GPUs memory.- Parameters:
right
- the right table- Returns:
- the joined table. The order of the columns returned will be left columns, right columns.
-
sortOrder
Get back a gather map that can be used to sort the data. This allows you to sort by data that does not appear in the final result and not pay the cost of gathering the data that is only needed for sorting.- Parameters:
args
- what order to sort the data by- Returns:
- a gather map
-
orderBy
Orders the table using the sortkeys returning a new allocated table. The caller is responsible for cleaning up theColumnVector
returned as part of the outputTable
Example usage: orderBy(true, OrderByArg.asc(0), OrderByArg.desc(3)...);
- Parameters:
args
- Suppliers to initialize sortKeys.- Returns:
- Sorted Table
-
merge
Merge multiple already sorted tables keeping the sort order the same. This is a more efficient version of concatenate followed by orderBy, but requires that the input already be sorted.- Parameters:
tables
- the tables that should be merged.args
- the ordering of the tables. Should match how they were sorted initially.- Returns:
- a combined sorted table.
-
merge
Merge multiple already sorted tables keeping the sort order the same. This is a more efficient version of concatenate followed by orderBy, but requires that the input already be sorted.- Parameters:
tables
- the tables that should be merged.args
- the ordering of the tables. Should match how they were sorted initially.- Returns:
- a combined sorted table.
-
groupBy
Returns aggregate operations grouped by columns provided in indices- Parameters:
groupByOptions
- Options provided in the builderindices
- columns to be considered for groupBy
-
groupBy
Returns aggregate operations grouped by columns provided in indices with default options as below: - null is considered as key while grouping. - keys are not presorted. - empty key order array. - empty null order array.- Parameters:
indices
- columns to be considered for groupBy
-
roundRobinPartition
Round-robin partition a table into the specified number of partitions. The first row is placed in the specified starting partition, the next row is placed in the next partition, and so on. When the last partition is reached then next partition is partition 0 and the algorithm continues until all rows have been placed in partitions, evenly distributing the rows among the partitions.- Parameters:
numberOfPartitions
- - number of partitions to usestartPartition
- - starting partition index (i.e.: where first row is placed).- Returns:
- -
PartitionedTable
- Table that exposes a limited functionality of theTable
class
-
onColumns
-
filter
Filters this table using a column of boolean values as a mask, returning a new one.Given a mask column, each element `i` from the input columns is copied to the output columns if the corresponding element `i` in the mask is non-null and `true`. This operation is stable: the input order is preserved.
This table and mask columns must have the same number of rows.
The output table has size equal to the number of elements in boolean_mask that are both non-null and `true`.
If the original table row count is zero, there is no error, and an empty table is returned.
- Parameters:
mask
- column of typeDType.BOOL8
used as a mask to filter the input column- Returns:
- table containing copy of all elements of this table passing the filter defined by the boolean mask
-
dropDuplicates
Copy rows of the current table to an output table such that duplicate rows in the key columns are ignored (i.e., only one row from the duplicate ones will be copied). These keys columns are a subset of the current table columns and their indices are specified by an input array. The order of rows in the output table is not specified.- Parameters:
keyColumns
- Array of indices representing key columns from the current table.keep
- Option specifying to keep any, first, last, or none of the found duplicates.nullsEqual
- Flag to denote whether nulls are treated as equal when comparing rows of the key columns to check for uniqueness.- Returns:
- Table with unique keys.
-
distinctCount
Count how many rows in the table are distinct from one another.- Parameters:
nullsEqual
- if nulls should be considered equal to each other or not.
-
distinctCount
public int distinctCount()Count how many rows in the table are distinct from one another. Nulls are considered to be equal to one another. -
contiguousSplit
Split a table at given boundaries, but the result of each split has memory that is laid out in a contiguous range of memory. This allows for us to optimize copying the data in a single operation.Example: input: [{10, 12, 14, 16, 18, 20, 22, 24, 26, 28}, {50, 52, 54, 56, 58, 60, 62, 64, 66, 68}] splits: {2, 5, 9} output: [{{10, 12}, {14, 16, 18}, {20, 22, 24, 26}, {28}}, {{50, 52}, {54, 56, 58}, {60, 62, 64, 66}, {68}}]
- Parameters:
indices
- A vector of indices where to make the split- Returns:
- The tables split at those points. NOTE: It is the responsibility of the caller to close the result. Each table and column holds a reference to the original buffer. But both the buffer and the table must be closed for the memory to be released.
-
makeChunkedPack
public ChunkedPack makeChunkedPack(long bounceBufferSize, RmmDeviceMemoryResource tempMemoryResource) Create an instance of `ChunkedPack` which can be used to pack this table contiguously in memory utilizing a bounce buffer of size `bounceBufferSize`. This version of `makeChunkedPack` takes a `RmmDviceMemoryResource`, which can be used to pre-allocate all scratch and temporary space required for the state of `cudf::chunked_pack`. The caller is responsible for calling close on the returned `ChunkedPack` object.- Parameters:
bounceBufferSize
- The size of bounce buffer that will be utilized to pack intotempMemoryResource
- A memory resource that is used to satisfy allocations for temporary and thrust scratch space.- Returns:
- An instance of `ChunkedPack` that the caller must use to finish the operation.
-
makeChunkedPack
Create an instance of `ChunkedPack` which can be used to pack this table contiguously in memory utilizing a bounce buffer of size `bounceBufferSize`. This version of `makeChunkedPack` makes use of the default per-device memory resource, for scratch and temporary space required for the state of `cudf::chunked_pack`. The caller is responsible for calling close on the returned `ChunkedPack` object.- Parameters:
bounceBufferSize
- The size of bounce buffer that will be utilized to pack into- Returns:
- An instance of `ChunkedPack` that the caller must use to finish the operation.
-
explode
Explodes a list column's elements. Any list is exploded, which means the elements of the list in each row are expanded into new rows in the output. The corresponding rows for other columns in the input are duplicated.Example: input: [[5,10,15], 100], [[20,25], 200], [[30], 300] index: 0 output: [5, 100], [10, 100], [15, 100], [20, 200], [25, 200], [30, 300]
Nulls propagate in different ways depending on what is null.input: [[5,null,15], 100], [null, 200] index: 0 output: [5, 100], [null, 100], [15, 100]
Note that null lists are completely removed from the output and nulls inside lists are pulled out and remain.- Parameters:
index
- Column index to explode inside the table.- Returns:
- A new table with explode_col exploded.
-
explodePosition
Explodes a list column's elements and includes a position column. Any list is exploded, which means the elements of the list in each row are expanded into new rows in the output. The corresponding rows for other columns in the input are duplicated. A position column is added that has the index inside the original list for each row. Example:input: [[5,10,15], 100], [[20,25], 200], [[30], 300] index: 0 output: [0, 5, 100], [1, 10, 100], [2, 15, 100], [0, 20, 200], [1, 25, 200], [0, 30, 300]
Nulls and empty lists propagate in different ways depending on what is null or empty.input: [[5,null,15], 100], [null, 200] index: 0 output: [5, 100], [null, 100], [15, 100]
Note that null lists are not included in the resulting table, but nulls inside lists and empty lists will be represented with a null entry for that column in that row.- Parameters:
index
- Column index to explode inside the table.- Returns:
- A new table with exploded value and position. The column order of return table is [cols before explode_input, explode_position, explode_value, cols after explode_input].
-
explodeOuter
Explodes a list column's elements. Any list is exploded, which means the elements of the list in each row are expanded into new rows in the output. The corresponding rows for other columns in the input are duplicated.Example: input: [[5,10,15], 100], [[20,25], 200], [[30], 300], index: 0 output: [5, 100], [10, 100], [15, 100], [20, 200], [25, 200], [30, 300]
Nulls propagate in different ways depending on what is null.input: [[5,null,15], 100], [null, 200] index: 0 output: [5, 100], [null, 100], [15, 100], [null, 200]
Note that null lists are completely removed from the output and nulls inside lists are pulled out and remain.- Parameters:
index
- Column index to explode inside the table.- Returns:
- A new table with explode_col exploded.
-
explodeOuterPosition
Explodes a list column's elements retaining any null entries or empty lists and includes a position column. Any list is exploded, which means the elements of the list in each row are expanded into new rows in the output. The corresponding rows for other columns in the input are duplicated. A position column is added that has the index inside the original list for each row. Example:Example: input: [[5,10,15], 100], [[20,25], 200], [[30], 300], index: 0 output: [0, 5, 100], [1, 10, 100], [2, 15, 100], [0, 20, 200], [1, 25, 200], [0, 30, 300]
Nulls and empty lists propagate as null entries in the result.input: [[5,null,15], 100], [null, 200], [[], 300] index: 0 output: [0, 5, 100], [1, null, 100], [2, 15, 100], [0, null, 200], [0, null, 300]
returns- Parameters:
index
- Column index to explode inside the table.- Returns:
- A new table with exploded value and position. The column order of return table is [cols before explode_input, explode_position, explode_value, cols after explode_input].
-
rowBitCount
Returns an approximate cumulative size in bits of all columns in the `table_view` for each row. This function counts bits instead of bytes to account for the null mask which only has one bit per row. Each row in the returned column is the sum of the per-row bit size for each column in the table. In some cases, this is an inexact approximation. Specifically, columns of lists and strings require N+1 offsets to represent N rows. It is up to the caller to calculate the small additional overhead of the terminating offset for any group of rows being considered. This function returns the per-row bit sizes as the columns are currently formed. This can end up being larger than the number you would get by gathering the rows. Specifically, the push-down of struct column validity masks can nullify rows that contain data for string or list columns. In these cases, the size returned is conservative such that: row_bit_count(column(x)) >= row_bit_count(gather(column(x)))- Returns:
- INT32 column of bit size per row of the table
-
gather
Gathers the rows of this table according to `gatherMap` such that row "i" in the resulting table's columns will contain row "gatherMap[i]" from this table. The number of rows in the result table will be equal to the number of elements in `gatherMap`. A negative value `i` in the `gatherMap` is interpreted as `i+n`, where `n` is the number of rows in this table.- Parameters:
gatherMap
- the map of indexes. Must be non-nullable and integral type.- Returns:
- the resulting Table.
-
gather
Gathers the rows of this table according to `gatherMap` such that row "i" in the resulting table's columns will contain row "gatherMap[i]" from this table. The number of rows in the result table will be equal to the number of elements in `gatherMap`. A negative value `i` in the `gatherMap` is interpreted as `i+n`, where `n` is the number of rows in this table.- Parameters:
gatherMap
- the map of indexes. Must be non-nullable and integral type.outOfBoundsPolicy
- policy to use when an out-of-range value is in `gatherMap`.- Returns:
- the resulting Table.
-
scatter
Scatters values from the source table into the target table out-of-place, returning a new result table. The scatter is performed according to a scatter map such that row `scatterMap[i]` of the destination table gets row `i` of the source table. All other rows of the destination table equal corresponding rows of the target table. The number of columns in source must match the number of columns in target and their corresponding data types must be the same. If the same index appears more than once in the scatter map, the result is undefined. A negative value `i` in the `scatterMap` is interpreted as `i + n`, where `n` is the number of rows in the `target` table.- Parameters:
scatterMap
- The map of indexes. Must be non-nullable and integral type.target
- The table into which rows from the current table are to be scattered out-of-place.- Returns:
- A new table which is the result of out-of-place scattering the source table into the target table.
-
scatter
Scatters values from the source rows into the target table out-of-place, returning a new result table. The scatter is performed according to a scatter map such that row `scatterMap[i]` of the destination table is replaced by the source row `i`. All other rows of the destination table equal corresponding rows of the target table. The number of elements in source must match the number of columns in target and their corresponding data types must be the same. If the same index appears more than once in the scatter map, the result is undefined. A negative value `i` in the `scatterMap` is interpreted as `i + n`, where `n` is the number of rows in the `target` table.- Parameters:
source
- The input scalars containing values to be scattered into the target table.scatterMap
- The map of indexes. Must be non-nullable and integral type.target
- The table into which the values from source are to be scattered out-of-place.- Returns:
- A new table which is the result of out-of-place scattering the source values into the target table.
-
leftJoinGatherMaps
Computes the gather maps that can be used to manifest the result of a left equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and the table argument represents the key columns from the right table. TwoGatherMap
instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the left join. It is the responsibility of the caller to close the resulting gather map instances.- Parameters:
rightKeys
- join key columns from the right tablecompareNullsEqual
- true if null key values should match otherwise false- Returns:
- left and right table gather maps
-
leftDistinctJoinGatherMap
Computes a gather map that can be used to manifest the result of a left equi-join between two tables where the right table is guaranteed to not contain any duplicated join keys. The left table can be used as-is to produce the left table columns resulting from the join, i.e.: left table ordering is preserved in the join result, so no gather map is required for the left table. The resulting gather map can be applied to the right table to produce the right table columns resulting from the join. It is assumed this table instance holds the key columns from the left table, and the table argument represents the key columns from the right table. AGatherMap
instance will be returned that can be used to gather the right table and that result combined with the left table to produce a left outer join result. It is the responsibility of the caller to close the resulting gather map instance.- Parameters:
rightKeys
- join key columns from the right tablecompareNullsEqual
- true if null key values should match otherwise false- Returns:
- right table gather map
-
leftJoinRowCount
Computes the number of rows resulting from a left equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and theHashJoin
argument has been constructed from the key columns from the right table.- Parameters:
rightHash
- hash table built from join key columns from the right table- Returns:
- row count of the join result
-
leftJoinGatherMaps
Computes the gather maps that can be used to manifest the result of a left equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and theHashJoin
argument has been constructed from the key columns from the right table. TwoGatherMap
instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the left join. It is the responsibility of the caller to close the resulting gather map instances.- Parameters:
rightHash
- hash table built from join key columns from the right table- Returns:
- left and right table gather maps
-
leftJoinGatherMaps
Computes the gather maps that can be used to manifest the result of a left equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and theHashJoin
argument has been constructed from the key columns from the right table. TwoGatherMap
instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the left join. It is the responsibility of the caller to close the resulting gather map instances. This interface allows passing an output row count that was previously computed fromleftJoinRowCount(HashJoin)
. WARNING: Passing a row count that is smaller than the actual row count will result in undefined behavior.- Parameters:
rightHash
- hash table built from join key columns from the right tableoutputRowCount
- number of output rows in the join result- Returns:
- left and right table gather maps
-
conditionalLeftJoinRowCount
Computes the number of rows from the result of a left join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table.- Parameters:
rightTable
- the right side table of the join in the joincondition
- conditional expression to evaluate during the join- Returns:
- row count for the join result
-
conditionalLeftJoinGatherMaps
Computes the gather maps that can be used to manifest the result of a left join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table. TwoGatherMap
instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the left join. It is the responsibility of the caller to close the resulting gather map instances.- Parameters:
rightTable
- the right side table of the join in the joincondition
- conditional expression to evaluate during the join- Returns:
- left and right table gather maps
-
conditionalLeftJoinGatherMaps
public GatherMap[] conditionalLeftJoinGatherMaps(Table rightTable, CompiledExpression condition, long outputRowCount) Computes the gather maps that can be used to manifest the result of a left join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table. TwoGatherMap
instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the left join. It is the responsibility of the caller to close the resulting gather map instances. This interface allows passing an output row count that was previously computed fromconditionalLeftJoinRowCount(Table, CompiledExpression)
. WARNING: Passing a row count that is smaller than the actual row count will result in undefined behavior.- Parameters:
rightTable
- the right side table of the join in the joincondition
- conditional expression to evaluate during the joinoutputRowCount
- number of output rows in the join result- Returns:
- left and right table gather maps
-
mixedLeftJoinSize
public static MixedJoinSize mixedLeftJoinSize(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality) Computes output size information for a left join between two tables using a mix of equality and inequality conditions. The entire join condition is assumed to be a logical AND of the equality condition and inequality condition. NOTE: It is the responsibility of the caller to close the resulting size information object or native resources can be leaked!- Parameters:
leftKeys
- the left table's key columns for the equality conditionrightKeys
- the right table's key columns for the equality conditionleftConditional
- the left table's columns needed to evaluate the inequality conditionrightConditional
- the right table's columns needed to evaluate the inequality conditioncondition
- the inequality condition of the joinnullEquality
- whether nulls should compare as equal- Returns:
- size information for the join
-
mixedLeftJoinGatherMaps
public static GatherMap[] mixedLeftJoinGatherMaps(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality) Computes the gather maps that can be used to manifest the result of a left join between two tables using a mix of equality and inequality conditions. The entire join condition is assumed to be a logical AND of the equality condition and inequality condition. TwoGatherMap
instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the left join. It is the responsibility of the caller to close the resulting gather map instances.- Parameters:
leftKeys
- the left table's key columns for the equality conditionrightKeys
- the right table's key columns for the equality conditionleftConditional
- the left table's columns needed to evaluate the inequality conditionrightConditional
- the right table's columns needed to evaluate the inequality conditioncondition
- the inequality condition of the joinnullEquality
- whether nulls should compare as equal- Returns:
- left and right table gather maps
-
mixedLeftJoinGatherMaps
public static GatherMap[] mixedLeftJoinGatherMaps(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality, MixedJoinSize joinSize) Computes the gather maps that can be used to manifest the result of a left join between two tables using a mix of equality and inequality conditions. The entire join condition is assumed to be a logical AND of the equality condition and inequality condition. TwoGatherMap
instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the left join. It is the responsibility of the caller to close the resulting gather map instances. This interface allows passing the size result frommixedLeftJoinSize(Table, Table, Table, Table, CompiledExpression, NullEquality)
when the output size was computed previously.- Parameters:
leftKeys
- the left table's key columns for the equality conditionrightKeys
- the right table's key columns for the equality conditionleftConditional
- the left table's columns needed to evaluate the inequality conditionrightConditional
- the right table's columns needed to evaluate the inequality conditioncondition
- the inequality condition of the joinnullEquality
- whether nulls should compare as equaljoinSize
- mixed join size result- Returns:
- left and right table gather maps
-
innerJoinGatherMaps
Computes the gather maps that can be used to manifest the result of an inner equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and the table argument represents the key columns from the right table. TwoGatherMap
instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the inner join. It is the responsibility of the caller to close the resulting gather map instances.- Parameters:
rightKeys
- join key columns from the right tablecompareNullsEqual
- true if null key values should match otherwise false- Returns:
- left and right table gather maps
-
innerDistinctJoinGatherMaps
Computes the gather maps that can be used to manifest the result of an inner equi-join between two tables where the right table is guaranteed to not contain any duplicated join keys. It is assumed this table instance holds the key columns from the left table, and the table argument represents the key columns from the right table. TwoGatherMap
instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the inner join. It is the responsibility of the caller to close the resulting gather map instances.- Parameters:
rightKeys
- join key columns from the right tablecompareNullsEqual
- true if null key values should match otherwise false- Returns:
- left and right table gather maps
-
innerJoinRowCount
Computes the number of rows resulting from an inner equi-join between two tables.- Parameters:
otherHash
- hash table built from join key columns from the other table- Returns:
- row count of the join result
-
innerJoinGatherMaps
Computes the gather maps that can be used to manifest the result of an inner equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and theHashJoin
argument has been constructed from the key columns from the right table. TwoGatherMap
instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the inner join. It is the responsibility of the caller to close the resulting gather map instances.- Parameters:
rightHash
- hash table built from join key columns from the right table- Returns:
- left and right table gather maps
-
innerJoinGatherMaps
Computes the gather maps that can be used to manifest the result of an inner equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and theHashJoin
argument has been constructed from the key columns from the right table. TwoGatherMap
instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the inner join. It is the responsibility of the caller to close the resulting gather map instances. This interface allows passing an output row count that was previously computed frominnerJoinRowCount(HashJoin)
. WARNING: Passing a row count that is smaller than the actual row count will result in undefined behavior.- Parameters:
rightHash
- hash table built from join key columns from the right tableoutputRowCount
- number of output rows in the join result- Returns:
- left and right table gather maps
-
conditionalInnerJoinRowCount
Computes the number of rows from the result of an inner join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table.- Parameters:
rightTable
- the right side table of the join in the joincondition
- conditional expression to evaluate during the join- Returns:
- row count for the join result
-
conditionalInnerJoinGatherMaps
Computes the gather maps that can be used to manifest the result of an inner join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table. TwoGatherMap
instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the inner join. It is the responsibility of the caller to close the resulting gather map instances.- Parameters:
rightTable
- the right side table of the joincondition
- conditional expression to evaluate during the join- Returns:
- left and right table gather maps
-
conditionalInnerJoinGatherMaps
public GatherMap[] conditionalInnerJoinGatherMaps(Table rightTable, CompiledExpression condition, long outputRowCount) Computes the gather maps that can be used to manifest the result of an inner join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table. TwoGatherMap
instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the inner join. It is the responsibility of the caller to close the resulting gather map instances. This interface allows passing an output row count that was previously computed fromconditionalInnerJoinRowCount(Table, CompiledExpression)
. WARNING: Passing a row count that is smaller than the actual row count will result in undefined behavior.- Parameters:
rightTable
- the right side table of the join in the joincondition
- conditional expression to evaluate during the joinoutputRowCount
- number of output rows in the join result- Returns:
- left and right table gather maps
-
mixedInnerJoinSize
public static MixedJoinSize mixedInnerJoinSize(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality) Computes output size information for an inner join between two tables using a mix of equality and inequality conditions. The entire join condition is assumed to be a logical AND of the equality condition and inequality condition. NOTE: It is the responsibility of the caller to close the resulting size information object or native resources can be leaked!- Parameters:
leftKeys
- the left table's key columns for the equality conditionrightKeys
- the right table's key columns for the equality conditionleftConditional
- the left table's columns needed to evaluate the inequality conditionrightConditional
- the right table's columns needed to evaluate the inequality conditioncondition
- the inequality condition of the joinnullEquality
- whether nulls should compare as equal- Returns:
- size information for the join
-
mixedInnerJoinGatherMaps
public static GatherMap[] mixedInnerJoinGatherMaps(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality) Computes the gather maps that can be used to manifest the result of an inner join between two tables using a mix of equality and inequality conditions. The entire join condition is assumed to be a logical AND of the equality condition and inequality condition. TwoGatherMap
instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the inner join. It is the responsibility of the caller to close the resulting gather map instances.- Parameters:
leftKeys
- the left table's key columns for the equality conditionrightKeys
- the right table's key columns for the equality conditionleftConditional
- the left table's columns needed to evaluate the inequality conditionrightConditional
- the right table's columns needed to evaluate the inequality conditioncondition
- the inequality condition of the joinnullEquality
- whether nulls should compare as equal- Returns:
- left and right table gather maps
-
mixedInnerJoinGatherMaps
public static GatherMap[] mixedInnerJoinGatherMaps(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality, MixedJoinSize joinSize) Computes the gather maps that can be used to manifest the result of an inner join between two tables using a mix of equality and inequality conditions. The entire join condition is assumed to be a logical AND of the equality condition and inequality condition. TwoGatherMap
instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the inner join. It is the responsibility of the caller to close the resulting gather map instances. This interface allows passing the size result frommixedInnerJoinSize(Table, Table, Table, Table, CompiledExpression, NullEquality)
when the output size was computed previously.- Parameters:
leftKeys
- the left table's key columns for the equality conditionrightKeys
- the right table's key columns for the equality conditionleftConditional
- the left table's columns needed to evaluate the inequality conditionrightConditional
- the right table's columns needed to evaluate the inequality conditioncondition
- the inequality condition of the joinnullEquality
- whether nulls should compare as equaljoinSize
- mixed join size result- Returns:
- left and right table gather maps
-
fullJoinGatherMaps
Computes the gather maps that can be used to manifest the result of an full equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and the table argument represents the key columns from the right table. TwoGatherMap
instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the full join. It is the responsibility of the caller to close the resulting gather map instances.- Parameters:
rightKeys
- join key columns from the right tablecompareNullsEqual
- true if null key values should match otherwise false- Returns:
- left and right table gather maps
-
fullJoinRowCount
Computes the number of rows resulting from a full equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and theHashJoin
argument has been constructed from the key columns from the right table. Note that unlikeleftJoinRowCount(HashJoin)
and {@link #innerJoinRowCount(HashJoin), this will perform some redundant calculations compared tofullJoinGatherMaps(HashJoin, long)
.- Parameters:
rightHash
- hash table built from join key columns from the right table- Returns:
- row count of the join result
-
fullJoinGatherMaps
Computes the gather maps that can be used to manifest the result of a full equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and theHashJoin
argument has been constructed from the key columns from the right table. TwoGatherMap
instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the full join. It is the responsibility of the caller to close the resulting gather map instances.- Parameters:
rightHash
- hash table built from join key columns from the right table- Returns:
- left and right table gather maps
-
fullJoinGatherMaps
Computes the gather maps that can be used to manifest the result of a full equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and theHashJoin
argument has been constructed from the key columns from the right table. TwoGatherMap
instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the full join. It is the responsibility of the caller to close the resulting gather map instances. This interface allows passing an output row count that was previously computed fromfullJoinRowCount(HashJoin)
. WARNING: Passing a row count that is smaller than the actual row count will result in undefined behavior.- Parameters:
rightHash
- hash table built from join key columns from the right tableoutputRowCount
- number of output rows in the join result- Returns:
- left and right table gather maps
-
conditionalFullJoinGatherMaps
Computes the gather maps that can be used to manifest the result of a full join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table. TwoGatherMap
instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the full join. It is the responsibility of the caller to close the resulting gather map instances.- Parameters:
rightTable
- the right side table of the joincondition
- conditional expression to evaluate during the join- Returns:
- left and right table gather maps
-
mixedFullJoinGatherMaps
public static GatherMap[] mixedFullJoinGatherMaps(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality) Computes the gather maps that can be used to manifest the result of a full join between two tables using a mix of equality and inequality conditions. The entire join condition is assumed to be a logical AND of the equality condition and inequality condition. TwoGatherMap
instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the full join. It is the responsibility of the caller to close the resulting gather map instances.- Parameters:
leftKeys
- the left table's key columns for the equality conditionrightKeys
- the right table's key columns for the equality conditionleftConditional
- the left table's columns needed to evaluate the inequality conditionrightConditional
- the right table's columns needed to evaluate the inequality conditioncondition
- the inequality condition of the joinnullEquality
- whether nulls should compare as equal- Returns:
- left and right table gather maps
-
leftSemiJoinGatherMap
Computes the gather map that can be used to manifest the result of a left semi-join between two tables. It is assumed this table instance holds the key columns from the left table, and the table argument represents the key columns from the right table. TheGatherMap
instance returned can be used to gather the left table to produce the result of the left semi-join. It is the responsibility of the caller to close the resulting gather map instance.- Parameters:
rightKeys
- join key columns from the right tablecompareNullsEqual
- true if null key values should match otherwise false- Returns:
- left table gather map
-
conditionalLeftSemiJoinRowCount
Computes the number of rows from the result of a left semi join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table.- Parameters:
rightTable
- the right side table of the join in the joincondition
- conditional expression to evaluate during the join- Returns:
- row count for the join result
-
conditionalLeftSemiJoinGatherMap
Computes the gather map that can be used to manifest the result of a left semi join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table. TheGatherMap
instance returned can be used to gather the left table to produce the result of the left semi join. It is the responsibility of the caller to close the resulting gather map instance.- Parameters:
rightTable
- the right side table of the joincondition
- conditional expression to evaluate during the join- Returns:
- left table gather map
-
conditionalLeftSemiJoinGatherMap
public GatherMap conditionalLeftSemiJoinGatherMap(Table rightTable, CompiledExpression condition, long outputRowCount) Computes the gather map that can be used to manifest the result of a left semi join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table. TheGatherMap
instance returned can be used to gather the left table to produce the result of the left semi join. It is the responsibility of the caller to close the resulting gather map instance. This interface allows passing an output row count that was previously computed fromconditionalLeftSemiJoinRowCount(Table, CompiledExpression)
. WARNING: Passing a row count that is smaller than the actual row count will result in undefined behavior.- Parameters:
rightTable
- the right side table of the joincondition
- conditional expression to evaluate during the joinoutputRowCount
- number of output rows in the join result- Returns:
- left table gather map
-
mixedLeftSemiJoinGatherMap
public static GatherMap mixedLeftSemiJoinGatherMap(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality) Computes the gather map that can be used to manifest the result of a left semi join between two tables using a mix of equality and inequality conditions. The entire join condition is assumed to be a logical AND of the equality condition and inequality condition. AGatherMap
instance will be returned that can be used to gather the left table to produce the result of the left semi join. It is the responsibility of the caller to close the resulting gather map instances.- Parameters:
leftKeys
- the left table's key columns for the equality conditionrightKeys
- the right table's key columns for the equality conditionleftConditional
- the left table's columns needed to evaluate the inequality conditionrightConditional
- the right table's columns needed to evaluate the inequality conditioncondition
- the inequality condition of the joinnullEquality
- whether nulls should compare as equal- Returns:
- left and right table gather maps
-
leftAntiJoinGatherMap
Computes the gather map that can be used to manifest the result of a left anti-join between two tables. It is assumed this table instance holds the key columns from the left table, and the table argument represents the key columns from the right table. TheGatherMap
instance returned can be used to gather the left table to produce the result of the left anti-join. It is the responsibility of the caller to close the resulting gather map instance.- Parameters:
rightKeys
- join key columns from the right tablecompareNullsEqual
- true if null key values should match otherwise false- Returns:
- left table gather map
-
conditionalLeftAntiJoinRowCount
Computes the number of rows from the result of a left anti join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table.- Parameters:
rightTable
- the right side table of the join in the joincondition
- conditional expression to evaluate during the join- Returns:
- row count for the join result
-
conditionalLeftAntiJoinGatherMap
Computes the gather map that can be used to manifest the result of a left anti join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table. TheGatherMap
instance returned can be used to gather the left table to produce the result of the left anti join. It is the responsibility of the caller to close the resulting gather map instance.- Parameters:
rightTable
- the right side table of the joincondition
- conditional expression to evaluate during the join- Returns:
- left table gather map
-
conditionalLeftAntiJoinGatherMap
public GatherMap conditionalLeftAntiJoinGatherMap(Table rightTable, CompiledExpression condition, long outputRowCount) Computes the gather map that can be used to manifest the result of a left anti join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table. TheGatherMap
instance returned can be used to gather the left table to produce the result of the left anti join. It is the responsibility of the caller to close the resulting gather map instance. This interface allows passing an output row count that was previously computed fromconditionalLeftAntiJoinRowCount(Table, CompiledExpression)
. WARNING: Passing a row count that is smaller than the actual row count will result in undefined behavior.- Parameters:
rightTable
- the right side table of the joincondition
- conditional expression to evaluate during the joinoutputRowCount
- number of output rows in the join result- Returns:
- left table gather map
-
mixedLeftAntiJoinGatherMap
public static GatherMap mixedLeftAntiJoinGatherMap(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality) Computes the gather map that can be used to manifest the result of a left anti join between two tables using a mix of equality and inequality conditions. The entire join condition is assumed to be a logical AND of the equality condition and inequality condition. AGatherMap
instance will be returned that can be used to gather the left table to produce the result of the left anti join. It is the responsibility of the caller to close the resulting gather map instances.- Parameters:
leftKeys
- the left table's key columns for the equality conditionrightKeys
- the right table's key columns for the equality conditionleftConditional
- the left table's columns needed to evaluate the inequality conditionrightConditional
- the right table's columns needed to evaluate the inequality conditioncondition
- the inequality condition of the joinnullEquality
- whether nulls should compare as equal- Returns:
- left and right table gather maps
-
fromPackedTable
Construct a table from a packed representation.- Parameters:
metadata
- host-based metadata for the tabledata
- GPU data buffer for the table- Returns:
- table which is zero-copy reconstructed from the packed-form
-
sample
Gather `n` samples from table randomly Note: does not preserve the ordering Example: input: {col1: {1, 2, 3, 4, 5}, col2: {6, 7, 8, 9, 10}} n: 3 replacement: false output: {col1: {3, 1, 4}, col2: {8, 6, 9}} replacement: true output: {col1: {3, 1, 1}, col2: {8, 6, 6}} throws "logic_error" if `n` > table rows and `replacement` == FALSE. throws "logic_error" if `n` < 0.- Parameters:
n
- non-negative number of samples expected from tablereplacement
- Allow or disallow sampling of the same row more than once.seed
- Seed value to initiate random number generator.- Returns:
- Table containing samples
-