Class Table

java.lang.Object
ai.rapids.cudf.Table
All Implemented Interfaces:
AutoCloseable

public final class Table extends Object implements AutoCloseable
Class to represent a collection of ColumnVectors and operations that can be performed on them collectively. The refcount on the columns will be increased once they are passed in
  • Constructor Details

    • Table

      public Table(ColumnVector... columns)
      Table class makes a copy of the array of ColumnVectors passed to it. The class will decrease the refcount on itself and all its contents when closed and free resources if refcount is zero
      Parameters:
      columns - - Array of ColumnVectors
    • Table

      public Table(long[] cudfColumns)
      Create a Table from an array of existing on device cudf::column pointers. Ownership of the columns is transferred to the ColumnVectors held by the new Table. In the case of an exception the columns will be deleted.
      Parameters:
      cudfColumns - - Array of nativeHandles
  • Method Details

    • getNativeView

      public long getNativeView()
      Return the native table view handle for this table
    • getColumn

      public ColumnVector getColumn(int index)
      Return the ColumnVector at the specified index. If you want to keep a reference to the column around past the life time of the table, you will need to increment the reference count on the column yourself.
    • getRowCount

      public final long getRowCount()
    • getNumberOfColumns

      public final int getNumberOfColumns()
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • getDeviceMemorySize

      public long getDeviceMemorySize()
      Returns the Device memory buffer size.
    • readCSV

      public static Table readCSV(Schema schema, File path)
      Read a CSV file using the default CSVOptions.
      Parameters:
      schema - the schema of the file. You may use Schema.INFERRED to infer the schema.
      path - the local file to read.
      Returns:
      the file parsed as a table on the GPU.
    • readCSV

      public static Table readCSV(Schema schema, CSVOptions opts, File path)
      Read a CSV file.
      Parameters:
      schema - the schema of the file. You may use Schema.INFERRED to infer the schema.
      opts - various CSV parsing options.
      path - the local file to read.
      Returns:
      the file parsed as a table on the GPU.
    • readCSV

      public static Table readCSV(Schema schema, byte[] buffer)
      Read CSV formatted data using the default CSVOptions.
      Parameters:
      schema - the schema of the data. You may use Schema.INFERRED to infer the schema.
      buffer - raw UTF8 formatted bytes.
      Returns:
      the data parsed as a table on the GPU.
    • readCSV

      public static Table readCSV(Schema schema, CSVOptions opts, byte[] buffer)
      Read CSV formatted data.
      Parameters:
      schema - the schema of the data. You may use Schema.INFERRED to infer the schema.
      opts - various CSV parsing options.
      buffer - raw UTF8 formatted bytes.
      Returns:
      the data parsed as a table on the GPU.
    • readCSV

      public static Table readCSV(Schema schema, CSVOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator)
      Read CSV formatted data.
      Parameters:
      schema - the schema of the data. You may use Schema.INFERRED to infer the schema.
      opts - various CSV parsing options.
      buffer - raw UTF8 formatted bytes.
      offset - the starting offset into buffer.
      len - the number of bytes to parse.
      hostMemoryAllocator - allocator for host memory buffers
      Returns:
      the data parsed as a table on the GPU.
    • readCSV

      public static Table readCSV(Schema schema, CSVOptions opts, byte[] buffer, long offset, long len)
    • readCSV

      public static Table readCSV(Schema schema, CSVOptions opts, HostMemoryBuffer buffer, long offset, long len)
      Read CSV formatted data.
      Parameters:
      schema - the schema of the data. You may use Schema.INFERRED to infer the schema.
      opts - various CSV parsing options.
      buffer - raw UTF8 formatted bytes.
      offset - the starting offset into buffer.
      len - the number of bytes to parse.
      Returns:
      the data parsed as a table on the GPU.
    • readCSV

      public static Table readCSV(Schema schema, CSVOptions opts, DataSource ds)
    • writeCSVToFile

      public void writeCSVToFile(CSVWriterOptions options, String outputPath)
    • getCSVBufferWriter

      public static TableWriter getCSVBufferWriter(CSVWriterOptions options, HostBufferConsumer bufferConsumer, HostMemoryAllocator hostMemoryAllocator)
    • getCSVBufferWriter

      public static TableWriter getCSVBufferWriter(CSVWriterOptions options, HostBufferConsumer bufferConsumer)
    • readJSON

      public static Table readJSON(Schema schema, File path)
      Read a JSON file using the default JSONOptions.
      Parameters:
      schema - the schema of the file. You may use Schema.INFERRED to infer the schema.
      path - the local file to read.
      Returns:
      the file parsed as a table on the GPU.
    • readJSON

      public static Table readJSON(Schema schema, byte[] buffer)
      Read JSON formatted data using the default JSONOptions.
      Parameters:
      schema - the schema of the data. You may use Schema.INFERRED to infer the schema.
      buffer - raw UTF8 formatted bytes.
      Returns:
      the data parsed as a table on the GPU.
    • readJSON

      public static Table readJSON(Schema schema, JSONOptions opts, byte[] buffer)
      Read JSON formatted data.
      Parameters:
      schema - the schema of the data. You may use Schema.INFERRED to infer the schema.
      opts - various JSON parsing options.
      buffer - raw UTF8 formatted bytes.
      Returns:
      the data parsed as a table on the GPU.
    • readJSON

      public static Table readJSON(Schema schema, JSONOptions opts, File path)
      Read a JSON file.
      Parameters:
      schema - the schema of the file. You may use Schema.INFERRED to infer the schema.
      opts - various JSON parsing options.
      path - the local file to read.
      Returns:
      the file parsed as a table on the GPU.
    • readJSON

      public static Table readJSON(Schema schema, JSONOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator)
      Read JSON formatted data.
      Parameters:
      schema - the schema of the data. You may use Schema.INFERRED to infer the schema.
      opts - various JSON parsing options.
      buffer - raw UTF8 formatted bytes.
      offset - the starting offset into buffer.
      len - the number of bytes to parse.
      hostMemoryAllocator - allocator for host memory buffers
      Returns:
      the data parsed as a table on the GPU.
    • readJSON

      public static Table readJSON(Schema schema, JSONOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator, int emptyRowCount)
      Deprecated.
      This method is deprecated since emptyRowCount is not used. Use the method without emptyRowCount instead.
      Read JSON formatted data.
      Parameters:
      schema - the schema of the data. You may use Schema.INFERRED to infer the schema.
      opts - various JSON parsing options.
      buffer - raw UTF8 formatted bytes.
      offset - the starting offset into buffer.
      len - the number of bytes to parse.
      hostMemoryAllocator - allocator for host memory buffers
      emptyRowCount - the number of rows to return if no columns were read.
      Returns:
      the data parsed as a table on the GPU.
    • readJSON

      public static Table readJSON(Schema schema, JSONOptions opts, byte[] buffer, long offset, long len, int emptyRowCount)
    • readJSON

      public static Table readJSON(Schema schema, JSONOptions opts, byte[] buffer, long offset, long len)
    • readJSON

      public static TableWithMeta readJSON(JSONOptions opts, HostMemoryBuffer buffer, long offset, long len)
      Read JSON formatted data and infer the column names and schema.
      Parameters:
      opts - various JSON parsing options.
      buffer - raw UTF8 formatted bytes.
      offset - the starting offset into buffer.
      len - the number of bytes to parse.
      Returns:
      the data parsed as a table on the GPU and the metadata for the table returned.
    • readAndInferJSON

      public static TableWithMeta readAndInferJSON(JSONOptions opts, DataSource ds)
      Read JSON formatted data and infer the column names and schema.
      Parameters:
      opts - various JSON parsing options.
      Returns:
      the data parsed as a table on the GPU and the metadata for the table returned.
    • readJSON

      public static Table readJSON(Schema schema, JSONOptions opts, HostMemoryBuffer buffer, long offset, long len)
      Read JSON formatted data.
      Parameters:
      schema - the schema of the data. You may use Schema.INFERRED to infer the schema.
      opts - various JSON parsing options.
      buffer - raw UTF8 formatted bytes.
      offset - the starting offset into buffer.
      len - the number of bytes to parse.
      Returns:
      the data parsed as a table on the GPU.
    • readJSON

      public static Table readJSON(Schema schema, JSONOptions opts, HostMemoryBuffer buffer, long offset, long len, int emptyRowCount)
      Deprecated.
      This method is deprecated since emptyRowCount is not used. Use the method without emptyRowCount instead.
      Read JSON formatted data.
      Parameters:
      schema - the schema of the data. You may use Schema.INFERRED to infer the schema.
      opts - various JSON parsing options.
      buffer - raw UTF8 formatted bytes.
      offset - the starting offset into buffer.
      len - the number of bytes to parse.
      emptyRowCount - the number of rows to use if no columns were found.
      Returns:
      the data parsed as a table on the GPU.
    • readJSON

      public static Table readJSON(Schema schema, JSONOptions opts, DataSource ds)
      Read JSON formatted data.
      Parameters:
      schema - the schema of the data. You may use Schema.INFERRED to infer the schema.
      opts - various JSON parsing options.
      ds - the DataSource to read from.
      Returns:
      the data parsed as a table on the GPU.
    • readJSON

      public static Table readJSON(Schema schema, JSONOptions opts, DataSource ds, int emptyRowCount)
      Deprecated.
      This method is deprecated since emptyRowCount is not used. Use the method without emptyRowCount instead.
      Read JSON formatted data.
      Parameters:
      schema - the schema of the data. You may use Schema.INFERRED to infer the schema.
      opts - various JSON parsing options.
      ds - the DataSource to read from.
      emptyRowCount - the number of rows to return if no columns were read.
      Returns:
      the data parsed as a table on the GPU.
    • readParquet

      public static Table readParquet(File path)
      Read a Parquet file using the default ParquetOptions.
      Parameters:
      path - the local file to read.
      Returns:
      the file parsed as a table on the GPU.
    • readParquet

      public static Table readParquet(ParquetOptions opts, File path)
      Read a Parquet file.
      Parameters:
      opts - various parquet parsing options.
      path - the local file to read.
      Returns:
      the file parsed as a table on the GPU.
    • readParquet

      public static Table readParquet(byte[] buffer)
      Read parquet formatted data.
      Parameters:
      buffer - raw parquet formatted bytes.
      Returns:
      the data parsed as a table on the GPU.
    • readParquet

      public static Table readParquet(ParquetOptions opts, byte[] buffer)
      Read parquet formatted data.
      Parameters:
      opts - various parquet parsing options.
      buffer - raw parquet formatted bytes.
      Returns:
      the data parsed as a table on the GPU.
    • readParquet

      public static Table readParquet(ParquetOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator)
      Read parquet formatted data.
      Parameters:
      opts - various parquet parsing options.
      buffer - raw parquet formatted bytes.
      offset - the starting offset into buffer.
      len - the number of bytes to parse.
      hostMemoryAllocator - allocator for host memory buffers
      Returns:
      the data parsed as a table on the GPU.
    • readParquet

      public static Table readParquet(ParquetOptions opts, byte[] buffer, long offset, long len)
      Read parquet formatted data.
      Parameters:
      opts - various parquet parsing options.
      buffer - raw parquet formatted bytes.
      offset - the starting offset into buffer.
      len - the number of bytes to parse.
      Returns:
      the data parsed as a table on the GPU.
    • readParquet

      public static Table readParquet(ParquetOptions opts, HostMemoryBuffer buffer, long offset, long len)
      Read parquet formatted data.
      Parameters:
      opts - various parquet parsing options.
      buffer - raw parquet formatted bytes.
      offset - the starting offset into buffer.
      len - the number of bytes to parse.
      Returns:
      the data parsed as a table on the GPU.
    • readParquet

      public static Table readParquet(ParquetOptions opts, HostMemoryBuffer... buffers)
      Read parquet formatted data.
      Parameters:
      opts - various parquet parsing options.
      buffers - Buffers containing the Parquet data. The buffers are logically concatenated in order to construct the file being read.
      Returns:
      the data parsed as a table on the GPU.
    • readParquet

      public static Table readParquet(ParquetOptions opts, DataSource ds)
      Read parquet formatted data.
      Parameters:
      opts - various parquet parsing options.
      ds - custom datasource to provide the Parquet file data
      Returns:
      the data parsed as a table on the GPU.
    • readAvro

      public static Table readAvro(File path)
      Read an Avro file using the default AvroOptions.
      Parameters:
      path - the local file to read.
      Returns:
      the file parsed as a table on the GPU.
    • readAvro

      public static Table readAvro(AvroOptions opts, File path)
      Read an Avro file.
      Parameters:
      opts - various Avro parsing options.
      path - the local file to read.
      Returns:
      the file parsed as a table on the GPU.
    • readAvro

      public static Table readAvro(byte[] buffer)
      Read Avro formatted data.
      Parameters:
      buffer - raw Avro formatted bytes.
      Returns:
      the data parsed as a table on the GPU.
    • readAvro

      public static Table readAvro(AvroOptions opts, byte[] buffer)
      Read Avro formatted data.
      Parameters:
      opts - various Avro parsing options.
      buffer - raw Avro formatted bytes.
      Returns:
      the data parsed as a table on the GPU.
    • readAvro

      public static Table readAvro(AvroOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator)
      Read Avro formatted data.
      Parameters:
      opts - various Avro parsing options.
      buffer - raw Avro formatted bytes.
      offset - the starting offset into buffer.
      len - the number of bytes to parse.
      hostMemoryAllocator - allocator for host memory buffers
      Returns:
      the data parsed as a table on the GPU.
    • readAvro

      public static Table readAvro(AvroOptions opts, byte[] buffer, long offset, long len)
    • readAvro

      public static Table readAvro(AvroOptions opts, HostMemoryBuffer buffer, long offset, long len)
      Read Avro formatted data.
      Parameters:
      opts - various Avro parsing options.
      buffer - raw Avro formatted bytes.
      offset - the starting offset into buffer.
      len - the number of bytes to parse.
      Returns:
      the data parsed as a table on the GPU.
    • readAvro

      public static Table readAvro(AvroOptions opts, DataSource ds)
    • readORC

      public static Table readORC(File path)
      Read a ORC file using the default ORCOptions.
      Parameters:
      path - the local file to read.
      Returns:
      the file parsed as a table on the GPU.
    • readORC

      public static Table readORC(ORCOptions opts, File path)
      Read a ORC file.
      Parameters:
      opts - ORC parsing options.
      path - the local file to read.
      Returns:
      the file parsed as a table on the GPU.
    • readORC

      public static Table readORC(byte[] buffer)
      Read ORC formatted data.
      Parameters:
      buffer - raw ORC formatted bytes.
      Returns:
      the data parsed as a table on the GPU.
    • readORC

      public static Table readORC(ORCOptions opts, byte[] buffer)
      Read ORC formatted data.
      Parameters:
      opts - various ORC parsing options.
      buffer - raw ORC formatted bytes.
      Returns:
      the data parsed as a table on the GPU.
    • readORC

      public static Table readORC(ORCOptions opts, byte[] buffer, long offset, long len, HostMemoryAllocator hostMemoryAllocator)
      Read ORC formatted data.
      Parameters:
      opts - various ORC parsing options.
      buffer - raw ORC formatted bytes.
      offset - the starting offset into buffer.
      len - the number of bytes to parse.
      hostMemoryAllocator - allocator for host memory buffers
      Returns:
      the data parsed as a table on the GPU.
    • readORC

      public static Table readORC(ORCOptions opts, byte[] buffer, long offset, long len)
    • readORC

      public static Table readORC(ORCOptions opts, HostMemoryBuffer buffer, long offset, long len)
      Read ORC formatted data.
      Parameters:
      opts - various ORC parsing options.
      buffer - raw ORC formatted bytes.
      offset - the starting offset into buffer.
      len - the number of bytes to parse.
      Returns:
      the data parsed as a table on the GPU.
    • readORC

      public static Table readORC(ORCOptions opts, DataSource ds)
    • writeParquetChunked

      public static TableWriter writeParquetChunked(ParquetWriterOptions options, File outputFile)
      Get a table writer to write parquet data to a file.
      Parameters:
      options - the parquet writer options.
      outputFile - where to write the file.
      Returns:
      a table writer to use for writing out multiple tables.
    • writeParquetChunked

      public static TableWriter writeParquetChunked(ParquetWriterOptions options, HostBufferConsumer consumer, HostMemoryAllocator hostMemoryAllocator)
      Get a table writer to write parquet data and handle each chunk with a callback.
      Parameters:
      options - the parquet writer options.
      consumer - a class that will be called when host buffers are ready with parquet formatted data in them.
      hostMemoryAllocator - allocator for host memory buffers
      Returns:
      a table writer to use for writing out multiple tables.
    • writeParquetChunked

      public static TableWriter writeParquetChunked(ParquetWriterOptions options, HostBufferConsumer consumer)
    • writeColumnViewsToParquet

      public static void writeColumnViewsToParquet(ParquetWriterOptions options, HostBufferConsumer consumer, HostMemoryAllocator hostMemoryAllocator, ColumnView... columnViews)
      This is an evolving API and most likely be removed in future releases. Please use with the caveat that this will not exist in the near future.
      Parameters:
      options - the Parquet writer options.
      consumer - a class that will be called when host buffers are ready with Parquet formatted data in them.
      hostMemoryAllocator - allocator for host memory buffers
      columnViews - ColumnViews to write to Parquet
    • writeColumnViewsToParquet

      public static void writeColumnViewsToParquet(ParquetWriterOptions options, HostBufferConsumer consumer, ColumnView... columnViews)
    • writeORCChunked

      public static TableWriter writeORCChunked(ORCWriterOptions options, File outputFile)
      Get a table writer to write ORC data to a file.
      Parameters:
      options - the ORC writer options.
      outputFile - where to write the file.
      Returns:
      a table writer to use for writing out multiple tables.
    • writeORCChunked

      public static TableWriter writeORCChunked(ORCWriterOptions options, HostBufferConsumer consumer, HostMemoryAllocator hostMemoryAllocator)
      Get a table writer to write ORC data and handle each chunk with a callback.
      Parameters:
      options - the ORC writer options.
      consumer - a class that will be called when host buffers are ready with ORC formatted data in them.
      hostMemoryAllocator - allocator for host memory buffers
      Returns:
      a table writer to use for writing out multiple tables.
    • writeORCChunked

      public static TableWriter writeORCChunked(ORCWriterOptions options, HostBufferConsumer consumer)
    • writeArrowIPCChunked

      public static TableWriter writeArrowIPCChunked(ArrowIPCWriterOptions options, File outputFile)
      Get a table writer to write arrow IPC data to a file.
      Parameters:
      options - the arrow IPC writer options.
      outputFile - where to write the file.
      Returns:
      a table writer to use for writing out multiple tables.
    • writeArrowIPCChunked

      public static TableWriter writeArrowIPCChunked(ArrowIPCWriterOptions options, HostBufferConsumer consumer, HostMemoryAllocator hostMemoryAllocator)
      Get a table writer to write arrow IPC data and handle each chunk with a callback.
      Parameters:
      options - the arrow IPC writer options.
      consumer - a class that will be called when host buffers are ready with arrow IPC formatted data in them.
      hostMemoryAllocator - allocator for host memory buffers
      Returns:
      a table writer to use for writing out multiple tables.
    • writeArrowIPCChunked

      public static TableWriter writeArrowIPCChunked(ArrowIPCWriterOptions options, HostBufferConsumer consumer)
    • readArrowIPCChunked

      public static StreamedTableReader readArrowIPCChunked(ArrowIPCOptions options, File inputFile)
      Get a reader that will return tables.
      Parameters:
      options - options for reading.
      inputFile - the file to read the Arrow IPC formatted data from
      Returns:
      a reader.
    • readArrowIPCChunked

      public static StreamedTableReader readArrowIPCChunked(File inputFile)
      Get a reader that will return tables.
      Parameters:
      inputFile - the file to read the Arrow IPC formatted data from
      Returns:
      a reader.
    • readArrowIPCChunked

      public static StreamedTableReader readArrowIPCChunked(ArrowIPCOptions options, HostBufferProvider provider, HostMemoryAllocator hostMemoryAllocator)
      Get a reader that will return tables.
      Parameters:
      options - options for reading.
      provider - what will provide the data being read.
      Returns:
      a reader.
    • readArrowIPCChunked

      public static StreamedTableReader readArrowIPCChunked(ArrowIPCOptions options, HostBufferProvider provider)
    • readArrowIPCChunked

      public static StreamedTableReader readArrowIPCChunked(HostBufferProvider provider)
      Get a reader that will return tables.
      Parameters:
      provider - what will provide the data being read.
      Returns:
      a reader.
    • concatenate

      public static Table concatenate(Table... tables)
      Concatenate multiple tables together to form a single table. The schema of each table (i.e.: number of columns and types of each column) must be equal across all tables and will determine the schema of the resulting table.
    • interleaveColumns

      public ColumnVector interleaveColumns()
      Interleave all columns into a single column. Columns must all have the same data type and length. Example: ``` input = [[A1, A2, A3], [B1, B2, B3]] return = [A1, B1, A2, B2, A3, B3] ```
      Returns:
      The interleaved columns as a single column
    • repeat

      public Table repeat(int count)
      Repeat each row of this table count times.
      Parameters:
      count - the number of times to repeat each row.
      Returns:
      the new Table.
    • repeat

      public Table repeat(ColumnView counts)
      Create a new table by repeating each row of this table. The number of repetitions of each row is defined by the corresponding value in counts.
      Parameters:
      counts - the number of times to repeat each row. Cannot have nulls, must be an Integer type, and must have one entry for each row in the table.
      Returns:
      the new Table.
      Throws:
      CudfException - on any error.
    • partition

      public PartitionedTable partition(ColumnView partitionMap, int numberOfPartitions)
      Partition this table using the mapping in partitionMap. partitionMap must be an integer column. The number of rows in partitionMap must be the same as this table. Each row in the map will indicate which partition the rows in the table belong to.
      Parameters:
      partitionMap - the partitions for each row.
      numberOfPartitions - number of partitions
      Returns:
      PartitionedTable Table that exposes a limited functionality of the Table class
    • lowerBound

      public ColumnVector lowerBound(boolean[] areNullsSmallest, Table valueTable, boolean[] descFlags)
      Find smallest indices in a sorted table where values should be inserted to maintain order.
       Example:
      
        Single column:
            idx            0   1   2   3   4
         inputTable  =   { 10, 20, 20, 30, 50 }
         valuesTable =   { 20 }
         result      =   { 1 }
      
        Multi Column:
            idx                0    1    2    3    4
         inputTable      = {{  10,  20,  20,  20,  20 },
                            { 5.0,  .5,  .5,  .7,  .7 },
                            {  90,  77,  78,  61,  61 }}
         valuesTable     = {{ 20 },
                            { .7 },
                            { 61 }}
         result          = {  3 }
       
      The input table and the values table need to be non-empty (row count > 0)
      Parameters:
      areNullsSmallest - per column, true if nulls are assumed smallest
      valueTable - the table of values to find insertion locations for
      descFlags - per column indicates the ordering, true if descending.
      Returns:
      ColumnVector with lower bound indices for all rows in valueTable
    • lowerBound

      public ColumnVector lowerBound(Table valueTable, OrderByArg... args)
      Find smallest indices in a sorted table where values should be inserted to maintain order. This is a convenience method. It pulls out the columns indicated by the args and sets up the ordering properly to call `lowerBound`.
      Parameters:
      valueTable - the table of values to find insertion locations for
      args - the sort order used to sort this table.
      Returns:
      ColumnVector with lower bound indices for all rows in valueTable
    • upperBound

      public ColumnVector upperBound(boolean[] areNullsSmallest, Table valueTable, boolean[] descFlags)
      Find largest indices in a sorted table where values should be inserted to maintain order. Given a sorted table return the upper bound.
       Example:
      
        Single column:
            idx            0   1   2   3   4
         inputTable  =   { 10, 20, 20, 30, 50 }
         valuesTable =   { 20 }
         result      =   { 3 }
      
        Multi Column:
            idx                0    1    2    3    4
         inputTable      = {{  10,  20,  20,  20,  20 },
                            { 5.0,  .5,  .5,  .7,  .7 },
                            {  90,  77,  78,  61,  61 }}
         valuesTable     = {{ 20 },
                            { .7 },
                            { 61 }}
         result          = {  5 }
       
      The input table and the values table need to be non-empty (row count > 0)
      Parameters:
      areNullsSmallest - per column, true if nulls are assumed smallest
      valueTable - the table of values to find insertion locations for
      descFlags - per column indicates the ordering, true if descending.
      Returns:
      ColumnVector with upper bound indices for all rows in valueTable
    • upperBound

      public ColumnVector upperBound(Table valueTable, OrderByArg... args)
      Find largest indices in a sorted table where values should be inserted to maintain order. This is a convenience method. It pulls out the columns indicated by the args and sets up the ordering properly to call `upperBound`.
      Parameters:
      valueTable - the table of values to find insertion locations for
      args - the sort order used to sort this table.
      Returns:
      ColumnVector with upper bound indices for all rows in valueTable
    • crossJoin

      public Table crossJoin(Table right)
      Joins two tables all of the left against all of the right. Be careful as this gets very big and you can easily use up all of the GPUs memory.
      Parameters:
      right - the right table
      Returns:
      the joined table. The order of the columns returned will be left columns, right columns.
    • sortOrder

      public ColumnVector sortOrder(OrderByArg... args)
      Get back a gather map that can be used to sort the data. This allows you to sort by data that does not appear in the final result and not pay the cost of gathering the data that is only needed for sorting.
      Parameters:
      args - what order to sort the data by
      Returns:
      a gather map
    • orderBy

      public Table orderBy(OrderByArg... args)
      Orders the table using the sortkeys returning a new allocated table. The caller is responsible for cleaning up the ColumnVector returned as part of the output Table

      Example usage: orderBy(true, OrderByArg.asc(0), OrderByArg.desc(3)...);

      Parameters:
      args - Suppliers to initialize sortKeys.
      Returns:
      Sorted Table
    • merge

      public static Table merge(Table[] tables, OrderByArg... args)
      Merge multiple already sorted tables keeping the sort order the same. This is a more efficient version of concatenate followed by orderBy, but requires that the input already be sorted.
      Parameters:
      tables - the tables that should be merged.
      args - the ordering of the tables. Should match how they were sorted initially.
      Returns:
      a combined sorted table.
    • merge

      public static Table merge(List<Table> tables, OrderByArg... args)
      Merge multiple already sorted tables keeping the sort order the same. This is a more efficient version of concatenate followed by orderBy, but requires that the input already be sorted.
      Parameters:
      tables - the tables that should be merged.
      args - the ordering of the tables. Should match how they were sorted initially.
      Returns:
      a combined sorted table.
    • groupBy

      public Table.GroupByOperation groupBy(GroupByOptions groupByOptions, int... indices)
      Returns aggregate operations grouped by columns provided in indices
      Parameters:
      groupByOptions - Options provided in the builder
      indices - columns to be considered for groupBy
    • groupBy

      public Table.GroupByOperation groupBy(int... indices)
      Returns aggregate operations grouped by columns provided in indices with default options as below: - null is considered as key while grouping. - keys are not presorted. - empty key order array. - empty null order array.
      Parameters:
      indices - columns to be considered for groupBy
    • roundRobinPartition

      public PartitionedTable roundRobinPartition(int numberOfPartitions, int startPartition)
      Round-robin partition a table into the specified number of partitions. The first row is placed in the specified starting partition, the next row is placed in the next partition, and so on. When the last partition is reached then next partition is partition 0 and the algorithm continues until all rows have been placed in partitions, evenly distributing the rows among the partitions.
      Parameters:
      numberOfPartitions - - number of partitions to use
      startPartition - - starting partition index (i.e.: where first row is placed).
      Returns:
      - PartitionedTable - Table that exposes a limited functionality of the Table class
    • onColumns

      public Table.TableOperation onColumns(int... indices)
    • filter

      public Table filter(ColumnView mask)
      Filters this table using a column of boolean values as a mask, returning a new one.

      Given a mask column, each element `i` from the input columns is copied to the output columns if the corresponding element `i` in the mask is non-null and `true`. This operation is stable: the input order is preserved.

      This table and mask columns must have the same number of rows.

      The output table has size equal to the number of elements in boolean_mask that are both non-null and `true`.

      If the original table row count is zero, there is no error, and an empty table is returned.

      Parameters:
      mask - column of type DType.BOOL8 used as a mask to filter the input column
      Returns:
      table containing copy of all elements of this table passing the filter defined by the boolean mask
    • dropDuplicates

      public Table dropDuplicates(int[] keyColumns, Table.DuplicateKeepOption keep, boolean nullsEqual)
      Copy rows of the current table to an output table such that duplicate rows in the key columns are ignored (i.e., only one row from the duplicate ones will be copied). These keys columns are a subset of the current table columns and their indices are specified by an input array. The order of rows in the output table is not specified.
      Parameters:
      keyColumns - Array of indices representing key columns from the current table.
      keep - Option specifying to keep any, first, last, or none of the found duplicates.
      nullsEqual - Flag to denote whether nulls are treated as equal when comparing rows of the key columns to check for uniqueness.
      Returns:
      Table with unique keys.
    • distinctCount

      public int distinctCount(NullEquality nullsEqual)
      Count how many rows in the table are distinct from one another.
      Parameters:
      nullsEqual - if nulls should be considered equal to each other or not.
    • distinctCount

      public int distinctCount()
      Count how many rows in the table are distinct from one another. Nulls are considered to be equal to one another.
    • contiguousSplit

      public ContiguousTable[] contiguousSplit(int... indices)
      Split a table at given boundaries, but the result of each split has memory that is laid out in a contiguous range of memory. This allows for us to optimize copying the data in a single operation. Example: input: [{10, 12, 14, 16, 18, 20, 22, 24, 26, 28}, {50, 52, 54, 56, 58, 60, 62, 64, 66, 68}] splits: {2, 5, 9} output: [{{10, 12}, {14, 16, 18}, {20, 22, 24, 26}, {28}}, {{50, 52}, {54, 56, 58}, {60, 62, 64, 66}, {68}}]
      Parameters:
      indices - A vector of indices where to make the split
      Returns:
      The tables split at those points. NOTE: It is the responsibility of the caller to close the result. Each table and column holds a reference to the original buffer. But both the buffer and the table must be closed for the memory to be released.
    • makeChunkedPack

      public ChunkedPack makeChunkedPack(long bounceBufferSize, RmmDeviceMemoryResource tempMemoryResource)
      Create an instance of `ChunkedPack` which can be used to pack this table contiguously in memory utilizing a bounce buffer of size `bounceBufferSize`. This version of `makeChunkedPack` takes a `RmmDviceMemoryResource`, which can be used to pre-allocate all scratch and temporary space required for the state of `cudf::chunked_pack`. The caller is responsible for calling close on the returned `ChunkedPack` object.
      Parameters:
      bounceBufferSize - The size of bounce buffer that will be utilized to pack into
      tempMemoryResource - A memory resource that is used to satisfy allocations for temporary and thrust scratch space.
      Returns:
      An instance of `ChunkedPack` that the caller must use to finish the operation.
    • makeChunkedPack

      public ChunkedPack makeChunkedPack(long bounceBufferSize)
      Create an instance of `ChunkedPack` which can be used to pack this table contiguously in memory utilizing a bounce buffer of size `bounceBufferSize`. This version of `makeChunkedPack` makes use of the default per-device memory resource, for scratch and temporary space required for the state of `cudf::chunked_pack`. The caller is responsible for calling close on the returned `ChunkedPack` object.
      Parameters:
      bounceBufferSize - The size of bounce buffer that will be utilized to pack into
      Returns:
      An instance of `ChunkedPack` that the caller must use to finish the operation.
    • explode

      public Table explode(int index)
      Explodes a list column's elements. Any list is exploded, which means the elements of the list in each row are expanded into new rows in the output. The corresponding rows for other columns in the input are duplicated. Example: input: [[5,10,15], 100], [[20,25], 200], [[30], 300] index: 0 output: [5, 100], [10, 100], [15, 100], [20, 200], [25, 200], [30, 300] Nulls propagate in different ways depending on what is null. input: [[5,null,15], 100], [null, 200] index: 0 output: [5, 100], [null, 100], [15, 100] Note that null lists are completely removed from the output and nulls inside lists are pulled out and remain.
      Parameters:
      index - Column index to explode inside the table.
      Returns:
      A new table with explode_col exploded.
    • explodePosition

      public Table explodePosition(int index)
      Explodes a list column's elements and includes a position column. Any list is exploded, which means the elements of the list in each row are expanded into new rows in the output. The corresponding rows for other columns in the input are duplicated. A position column is added that has the index inside the original list for each row. Example: input: [[5,10,15], 100], [[20,25], 200], [[30], 300] index: 0 output: [0, 5, 100], [1, 10, 100], [2, 15, 100], [0, 20, 200], [1, 25, 200], [0, 30, 300] Nulls and empty lists propagate in different ways depending on what is null or empty. input: [[5,null,15], 100], [null, 200] index: 0 output: [5, 100], [null, 100], [15, 100] Note that null lists are not included in the resulting table, but nulls inside lists and empty lists will be represented with a null entry for that column in that row.
      Parameters:
      index - Column index to explode inside the table.
      Returns:
      A new table with exploded value and position. The column order of return table is [cols before explode_input, explode_position, explode_value, cols after explode_input].
    • explodeOuter

      public Table explodeOuter(int index)
      Explodes a list column's elements. Any list is exploded, which means the elements of the list in each row are expanded into new rows in the output. The corresponding rows for other columns in the input are duplicated. Example: input: [[5,10,15], 100], [[20,25], 200], [[30], 300], index: 0 output: [5, 100], [10, 100], [15, 100], [20, 200], [25, 200], [30, 300] Nulls propagate in different ways depending on what is null. input: [[5,null,15], 100], [null, 200] index: 0 output: [5, 100], [null, 100], [15, 100], [null, 200] Note that null lists are completely removed from the output and nulls inside lists are pulled out and remain.
      Parameters:
      index - Column index to explode inside the table.
      Returns:
      A new table with explode_col exploded.
    • explodeOuterPosition

      public Table explodeOuterPosition(int index)
      Explodes a list column's elements retaining any null entries or empty lists and includes a position column. Any list is exploded, which means the elements of the list in each row are expanded into new rows in the output. The corresponding rows for other columns in the input are duplicated. A position column is added that has the index inside the original list for each row. Example: Example: input: [[5,10,15], 100], [[20,25], 200], [[30], 300], index: 0 output: [0, 5, 100], [1, 10, 100], [2, 15, 100], [0, 20, 200], [1, 25, 200], [0, 30, 300] Nulls and empty lists propagate as null entries in the result. input: [[5,null,15], 100], [null, 200], [[], 300] index: 0 output: [0, 5, 100], [1, null, 100], [2, 15, 100], [0, null, 200], [0, null, 300] returns
      Parameters:
      index - Column index to explode inside the table.
      Returns:
      A new table with exploded value and position. The column order of return table is [cols before explode_input, explode_position, explode_value, cols after explode_input].
    • rowBitCount

      public ColumnVector rowBitCount()
      Returns an approximate cumulative size in bits of all columns in the `table_view` for each row. This function counts bits instead of bytes to account for the null mask which only has one bit per row. Each row in the returned column is the sum of the per-row bit size for each column in the table. In some cases, this is an inexact approximation. Specifically, columns of lists and strings require N+1 offsets to represent N rows. It is up to the caller to calculate the small additional overhead of the terminating offset for any group of rows being considered. This function returns the per-row bit sizes as the columns are currently formed. This can end up being larger than the number you would get by gathering the rows. Specifically, the push-down of struct column validity masks can nullify rows that contain data for string or list columns. In these cases, the size returned is conservative such that: row_bit_count(column(x)) >= row_bit_count(gather(column(x)))
      Returns:
      INT32 column of bit size per row of the table
    • gather

      public Table gather(ColumnView gatherMap)
      Gathers the rows of this table according to `gatherMap` such that row "i" in the resulting table's columns will contain row "gatherMap[i]" from this table. The number of rows in the result table will be equal to the number of elements in `gatherMap`. A negative value `i` in the `gatherMap` is interpreted as `i+n`, where `n` is the number of rows in this table.
      Parameters:
      gatherMap - the map of indexes. Must be non-nullable and integral type.
      Returns:
      the resulting Table.
    • gather

      public Table gather(ColumnView gatherMap, OutOfBoundsPolicy outOfBoundsPolicy)
      Gathers the rows of this table according to `gatherMap` such that row "i" in the resulting table's columns will contain row "gatherMap[i]" from this table. The number of rows in the result table will be equal to the number of elements in `gatherMap`. A negative value `i` in the `gatherMap` is interpreted as `i+n`, where `n` is the number of rows in this table.
      Parameters:
      gatherMap - the map of indexes. Must be non-nullable and integral type.
      outOfBoundsPolicy - policy to use when an out-of-range value is in `gatherMap`.
      Returns:
      the resulting Table.
    • scatter

      public Table scatter(ColumnView scatterMap, Table target)
      Scatters values from the source table into the target table out-of-place, returning a new result table. The scatter is performed according to a scatter map such that row `scatterMap[i]` of the destination table gets row `i` of the source table. All other rows of the destination table equal corresponding rows of the target table. The number of columns in source must match the number of columns in target and their corresponding data types must be the same. If the same index appears more than once in the scatter map, the result is undefined. A negative value `i` in the `scatterMap` is interpreted as `i + n`, where `n` is the number of rows in the `target` table.
      Parameters:
      scatterMap - The map of indexes. Must be non-nullable and integral type.
      target - The table into which rows from the current table are to be scattered out-of-place.
      Returns:
      A new table which is the result of out-of-place scattering the source table into the target table.
    • scatter

      public static Table scatter(Scalar[] source, ColumnView scatterMap, Table target)
      Scatters values from the source rows into the target table out-of-place, returning a new result table. The scatter is performed according to a scatter map such that row `scatterMap[i]` of the destination table is replaced by the source row `i`. All other rows of the destination table equal corresponding rows of the target table. The number of elements in source must match the number of columns in target and their corresponding data types must be the same. If the same index appears more than once in the scatter map, the result is undefined. A negative value `i` in the `scatterMap` is interpreted as `i + n`, where `n` is the number of rows in the `target` table.
      Parameters:
      source - The input scalars containing values to be scattered into the target table.
      scatterMap - The map of indexes. Must be non-nullable and integral type.
      target - The table into which the values from source are to be scattered out-of-place.
      Returns:
      A new table which is the result of out-of-place scattering the source values into the target table.
    • leftJoinGatherMaps

      public GatherMap[] leftJoinGatherMaps(Table rightKeys, boolean compareNullsEqual)
      Computes the gather maps that can be used to manifest the result of a left equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and the table argument represents the key columns from the right table. Two GatherMap instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the left join. It is the responsibility of the caller to close the resulting gather map instances.
      Parameters:
      rightKeys - join key columns from the right table
      compareNullsEqual - true if null key values should match otherwise false
      Returns:
      left and right table gather maps
    • leftDistinctJoinGatherMap

      public GatherMap leftDistinctJoinGatherMap(Table rightKeys, boolean compareNullsEqual)
      Computes a gather map that can be used to manifest the result of a left equi-join between two tables where the right table is guaranteed to not contain any duplicated join keys. The left table can be used as-is to produce the left table columns resulting from the join, i.e.: left table ordering is preserved in the join result, so no gather map is required for the left table. The resulting gather map can be applied to the right table to produce the right table columns resulting from the join. It is assumed this table instance holds the key columns from the left table, and the table argument represents the key columns from the right table. A GatherMap instance will be returned that can be used to gather the right table and that result combined with the left table to produce a left outer join result. It is the responsibility of the caller to close the resulting gather map instance.
      Parameters:
      rightKeys - join key columns from the right table
      compareNullsEqual - true if null key values should match otherwise false
      Returns:
      right table gather map
    • leftJoinRowCount

      public long leftJoinRowCount(HashJoin rightHash)
      Computes the number of rows resulting from a left equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and the HashJoin argument has been constructed from the key columns from the right table.
      Parameters:
      rightHash - hash table built from join key columns from the right table
      Returns:
      row count of the join result
    • leftJoinGatherMaps

      public GatherMap[] leftJoinGatherMaps(HashJoin rightHash)
      Computes the gather maps that can be used to manifest the result of a left equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and the HashJoin argument has been constructed from the key columns from the right table. Two GatherMap instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the left join. It is the responsibility of the caller to close the resulting gather map instances.
      Parameters:
      rightHash - hash table built from join key columns from the right table
      Returns:
      left and right table gather maps
    • leftJoinGatherMaps

      public GatherMap[] leftJoinGatherMaps(HashJoin rightHash, long outputRowCount)
      Computes the gather maps that can be used to manifest the result of a left equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and the HashJoin argument has been constructed from the key columns from the right table. Two GatherMap instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the left join. It is the responsibility of the caller to close the resulting gather map instances. This interface allows passing an output row count that was previously computed from leftJoinRowCount(HashJoin). WARNING: Passing a row count that is smaller than the actual row count will result in undefined behavior.
      Parameters:
      rightHash - hash table built from join key columns from the right table
      outputRowCount - number of output rows in the join result
      Returns:
      left and right table gather maps
    • conditionalLeftJoinRowCount

      public long conditionalLeftJoinRowCount(Table rightTable, CompiledExpression condition)
      Computes the number of rows from the result of a left join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table.
      Parameters:
      rightTable - the right side table of the join in the join
      condition - conditional expression to evaluate during the join
      Returns:
      row count for the join result
    • conditionalLeftJoinGatherMaps

      public GatherMap[] conditionalLeftJoinGatherMaps(Table rightTable, CompiledExpression condition)
      Computes the gather maps that can be used to manifest the result of a left join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table. Two GatherMap instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the left join. It is the responsibility of the caller to close the resulting gather map instances.
      Parameters:
      rightTable - the right side table of the join in the join
      condition - conditional expression to evaluate during the join
      Returns:
      left and right table gather maps
    • conditionalLeftJoinGatherMaps

      public GatherMap[] conditionalLeftJoinGatherMaps(Table rightTable, CompiledExpression condition, long outputRowCount)
      Computes the gather maps that can be used to manifest the result of a left join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table. Two GatherMap instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the left join. It is the responsibility of the caller to close the resulting gather map instances. This interface allows passing an output row count that was previously computed from conditionalLeftJoinRowCount(Table, CompiledExpression). WARNING: Passing a row count that is smaller than the actual row count will result in undefined behavior.
      Parameters:
      rightTable - the right side table of the join in the join
      condition - conditional expression to evaluate during the join
      outputRowCount - number of output rows in the join result
      Returns:
      left and right table gather maps
    • mixedLeftJoinSize

      public static MixedJoinSize mixedLeftJoinSize(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
      Computes output size information for a left join between two tables using a mix of equality and inequality conditions. The entire join condition is assumed to be a logical AND of the equality condition and inequality condition. NOTE: It is the responsibility of the caller to close the resulting size information object or native resources can be leaked!
      Parameters:
      leftKeys - the left table's key columns for the equality condition
      rightKeys - the right table's key columns for the equality condition
      leftConditional - the left table's columns needed to evaluate the inequality condition
      rightConditional - the right table's columns needed to evaluate the inequality condition
      condition - the inequality condition of the join
      nullEquality - whether nulls should compare as equal
      Returns:
      size information for the join
    • mixedLeftJoinGatherMaps

      public static GatherMap[] mixedLeftJoinGatherMaps(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
      Computes the gather maps that can be used to manifest the result of a left join between two tables using a mix of equality and inequality conditions. The entire join condition is assumed to be a logical AND of the equality condition and inequality condition. Two GatherMap instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the left join. It is the responsibility of the caller to close the resulting gather map instances.
      Parameters:
      leftKeys - the left table's key columns for the equality condition
      rightKeys - the right table's key columns for the equality condition
      leftConditional - the left table's columns needed to evaluate the inequality condition
      rightConditional - the right table's columns needed to evaluate the inequality condition
      condition - the inequality condition of the join
      nullEquality - whether nulls should compare as equal
      Returns:
      left and right table gather maps
    • mixedLeftJoinGatherMaps

      public static GatherMap[] mixedLeftJoinGatherMaps(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality, MixedJoinSize joinSize)
      Computes the gather maps that can be used to manifest the result of a left join between two tables using a mix of equality and inequality conditions. The entire join condition is assumed to be a logical AND of the equality condition and inequality condition. Two GatherMap instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the left join. It is the responsibility of the caller to close the resulting gather map instances. This interface allows passing the size result from mixedLeftJoinSize(Table, Table, Table, Table, CompiledExpression, NullEquality) when the output size was computed previously.
      Parameters:
      leftKeys - the left table's key columns for the equality condition
      rightKeys - the right table's key columns for the equality condition
      leftConditional - the left table's columns needed to evaluate the inequality condition
      rightConditional - the right table's columns needed to evaluate the inequality condition
      condition - the inequality condition of the join
      nullEquality - whether nulls should compare as equal
      joinSize - mixed join size result
      Returns:
      left and right table gather maps
    • innerJoinGatherMaps

      public GatherMap[] innerJoinGatherMaps(Table rightKeys, boolean compareNullsEqual)
      Computes the gather maps that can be used to manifest the result of an inner equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and the table argument represents the key columns from the right table. Two GatherMap instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the inner join. It is the responsibility of the caller to close the resulting gather map instances.
      Parameters:
      rightKeys - join key columns from the right table
      compareNullsEqual - true if null key values should match otherwise false
      Returns:
      left and right table gather maps
    • innerDistinctJoinGatherMaps

      public GatherMap[] innerDistinctJoinGatherMaps(Table rightKeys, boolean compareNullsEqual)
      Computes the gather maps that can be used to manifest the result of an inner equi-join between two tables where the right table is guaranteed to not contain any duplicated join keys. It is assumed this table instance holds the key columns from the left table, and the table argument represents the key columns from the right table. Two GatherMap instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the inner join. It is the responsibility of the caller to close the resulting gather map instances.
      Parameters:
      rightKeys - join key columns from the right table
      compareNullsEqual - true if null key values should match otherwise false
      Returns:
      left and right table gather maps
    • innerJoinRowCount

      public long innerJoinRowCount(HashJoin otherHash)
      Computes the number of rows resulting from an inner equi-join between two tables.
      Parameters:
      otherHash - hash table built from join key columns from the other table
      Returns:
      row count of the join result
    • innerJoinGatherMaps

      public GatherMap[] innerJoinGatherMaps(HashJoin rightHash)
      Computes the gather maps that can be used to manifest the result of an inner equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and the HashJoin argument has been constructed from the key columns from the right table. Two GatherMap instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the inner join. It is the responsibility of the caller to close the resulting gather map instances.
      Parameters:
      rightHash - hash table built from join key columns from the right table
      Returns:
      left and right table gather maps
    • innerJoinGatherMaps

      public GatherMap[] innerJoinGatherMaps(HashJoin rightHash, long outputRowCount)
      Computes the gather maps that can be used to manifest the result of an inner equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and the HashJoin argument has been constructed from the key columns from the right table. Two GatherMap instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the inner join. It is the responsibility of the caller to close the resulting gather map instances. This interface allows passing an output row count that was previously computed from innerJoinRowCount(HashJoin). WARNING: Passing a row count that is smaller than the actual row count will result in undefined behavior.
      Parameters:
      rightHash - hash table built from join key columns from the right table
      outputRowCount - number of output rows in the join result
      Returns:
      left and right table gather maps
    • conditionalInnerJoinRowCount

      public long conditionalInnerJoinRowCount(Table rightTable, CompiledExpression condition)
      Computes the number of rows from the result of an inner join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table.
      Parameters:
      rightTable - the right side table of the join in the join
      condition - conditional expression to evaluate during the join
      Returns:
      row count for the join result
    • conditionalInnerJoinGatherMaps

      public GatherMap[] conditionalInnerJoinGatherMaps(Table rightTable, CompiledExpression condition)
      Computes the gather maps that can be used to manifest the result of an inner join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table. Two GatherMap instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the inner join. It is the responsibility of the caller to close the resulting gather map instances.
      Parameters:
      rightTable - the right side table of the join
      condition - conditional expression to evaluate during the join
      Returns:
      left and right table gather maps
    • conditionalInnerJoinGatherMaps

      public GatherMap[] conditionalInnerJoinGatherMaps(Table rightTable, CompiledExpression condition, long outputRowCount)
      Computes the gather maps that can be used to manifest the result of an inner join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table. Two GatherMap instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the inner join. It is the responsibility of the caller to close the resulting gather map instances. This interface allows passing an output row count that was previously computed from conditionalInnerJoinRowCount(Table, CompiledExpression). WARNING: Passing a row count that is smaller than the actual row count will result in undefined behavior.
      Parameters:
      rightTable - the right side table of the join in the join
      condition - conditional expression to evaluate during the join
      outputRowCount - number of output rows in the join result
      Returns:
      left and right table gather maps
    • mixedInnerJoinSize

      public static MixedJoinSize mixedInnerJoinSize(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
      Computes output size information for an inner join between two tables using a mix of equality and inequality conditions. The entire join condition is assumed to be a logical AND of the equality condition and inequality condition. NOTE: It is the responsibility of the caller to close the resulting size information object or native resources can be leaked!
      Parameters:
      leftKeys - the left table's key columns for the equality condition
      rightKeys - the right table's key columns for the equality condition
      leftConditional - the left table's columns needed to evaluate the inequality condition
      rightConditional - the right table's columns needed to evaluate the inequality condition
      condition - the inequality condition of the join
      nullEquality - whether nulls should compare as equal
      Returns:
      size information for the join
    • mixedInnerJoinGatherMaps

      public static GatherMap[] mixedInnerJoinGatherMaps(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
      Computes the gather maps that can be used to manifest the result of an inner join between two tables using a mix of equality and inequality conditions. The entire join condition is assumed to be a logical AND of the equality condition and inequality condition. Two GatherMap instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the inner join. It is the responsibility of the caller to close the resulting gather map instances.
      Parameters:
      leftKeys - the left table's key columns for the equality condition
      rightKeys - the right table's key columns for the equality condition
      leftConditional - the left table's columns needed to evaluate the inequality condition
      rightConditional - the right table's columns needed to evaluate the inequality condition
      condition - the inequality condition of the join
      nullEquality - whether nulls should compare as equal
      Returns:
      left and right table gather maps
    • mixedInnerJoinGatherMaps

      public static GatherMap[] mixedInnerJoinGatherMaps(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality, MixedJoinSize joinSize)
      Computes the gather maps that can be used to manifest the result of an inner join between two tables using a mix of equality and inequality conditions. The entire join condition is assumed to be a logical AND of the equality condition and inequality condition. Two GatherMap instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the inner join. It is the responsibility of the caller to close the resulting gather map instances. This interface allows passing the size result from mixedInnerJoinSize(Table, Table, Table, Table, CompiledExpression, NullEquality) when the output size was computed previously.
      Parameters:
      leftKeys - the left table's key columns for the equality condition
      rightKeys - the right table's key columns for the equality condition
      leftConditional - the left table's columns needed to evaluate the inequality condition
      rightConditional - the right table's columns needed to evaluate the inequality condition
      condition - the inequality condition of the join
      nullEquality - whether nulls should compare as equal
      joinSize - mixed join size result
      Returns:
      left and right table gather maps
    • fullJoinGatherMaps

      public GatherMap[] fullJoinGatherMaps(Table rightKeys, boolean compareNullsEqual)
      Computes the gather maps that can be used to manifest the result of an full equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and the table argument represents the key columns from the right table. Two GatherMap instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the full join. It is the responsibility of the caller to close the resulting gather map instances.
      Parameters:
      rightKeys - join key columns from the right table
      compareNullsEqual - true if null key values should match otherwise false
      Returns:
      left and right table gather maps
    • fullJoinRowCount

      public long fullJoinRowCount(HashJoin rightHash)
      Computes the number of rows resulting from a full equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and the HashJoin argument has been constructed from the key columns from the right table. Note that unlike leftJoinRowCount(HashJoin) and {@link #innerJoinRowCount(HashJoin), this will perform some redundant calculations compared to fullJoinGatherMaps(HashJoin, long).
      Parameters:
      rightHash - hash table built from join key columns from the right table
      Returns:
      row count of the join result
    • fullJoinGatherMaps

      public GatherMap[] fullJoinGatherMaps(HashJoin rightHash)
      Computes the gather maps that can be used to manifest the result of a full equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and the HashJoin argument has been constructed from the key columns from the right table. Two GatherMap instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the full join. It is the responsibility of the caller to close the resulting gather map instances.
      Parameters:
      rightHash - hash table built from join key columns from the right table
      Returns:
      left and right table gather maps
    • fullJoinGatherMaps

      public GatherMap[] fullJoinGatherMaps(HashJoin rightHash, long outputRowCount)
      Computes the gather maps that can be used to manifest the result of a full equi-join between two tables. It is assumed this table instance holds the key columns from the left table, and the HashJoin argument has been constructed from the key columns from the right table. Two GatherMap instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the full join. It is the responsibility of the caller to close the resulting gather map instances. This interface allows passing an output row count that was previously computed from fullJoinRowCount(HashJoin). WARNING: Passing a row count that is smaller than the actual row count will result in undefined behavior.
      Parameters:
      rightHash - hash table built from join key columns from the right table
      outputRowCount - number of output rows in the join result
      Returns:
      left and right table gather maps
    • conditionalFullJoinGatherMaps

      public GatherMap[] conditionalFullJoinGatherMaps(Table rightTable, CompiledExpression condition)
      Computes the gather maps that can be used to manifest the result of a full join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table. Two GatherMap instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the full join. It is the responsibility of the caller to close the resulting gather map instances.
      Parameters:
      rightTable - the right side table of the join
      condition - conditional expression to evaluate during the join
      Returns:
      left and right table gather maps
    • mixedFullJoinGatherMaps

      public static GatherMap[] mixedFullJoinGatherMaps(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
      Computes the gather maps that can be used to manifest the result of a full join between two tables using a mix of equality and inequality conditions. The entire join condition is assumed to be a logical AND of the equality condition and inequality condition. Two GatherMap instances will be returned that can be used to gather the left and right tables, respectively, to produce the result of the full join. It is the responsibility of the caller to close the resulting gather map instances.
      Parameters:
      leftKeys - the left table's key columns for the equality condition
      rightKeys - the right table's key columns for the equality condition
      leftConditional - the left table's columns needed to evaluate the inequality condition
      rightConditional - the right table's columns needed to evaluate the inequality condition
      condition - the inequality condition of the join
      nullEquality - whether nulls should compare as equal
      Returns:
      left and right table gather maps
    • leftSemiJoinGatherMap

      public GatherMap leftSemiJoinGatherMap(Table rightKeys, boolean compareNullsEqual)
      Computes the gather map that can be used to manifest the result of a left semi-join between two tables. It is assumed this table instance holds the key columns from the left table, and the table argument represents the key columns from the right table. The GatherMap instance returned can be used to gather the left table to produce the result of the left semi-join. It is the responsibility of the caller to close the resulting gather map instance.
      Parameters:
      rightKeys - join key columns from the right table
      compareNullsEqual - true if null key values should match otherwise false
      Returns:
      left table gather map
    • conditionalLeftSemiJoinRowCount

      public long conditionalLeftSemiJoinRowCount(Table rightTable, CompiledExpression condition)
      Computes the number of rows from the result of a left semi join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table.
      Parameters:
      rightTable - the right side table of the join in the join
      condition - conditional expression to evaluate during the join
      Returns:
      row count for the join result
    • conditionalLeftSemiJoinGatherMap

      public GatherMap conditionalLeftSemiJoinGatherMap(Table rightTable, CompiledExpression condition)
      Computes the gather map that can be used to manifest the result of a left semi join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table. The GatherMap instance returned can be used to gather the left table to produce the result of the left semi join. It is the responsibility of the caller to close the resulting gather map instance.
      Parameters:
      rightTable - the right side table of the join
      condition - conditional expression to evaluate during the join
      Returns:
      left table gather map
    • conditionalLeftSemiJoinGatherMap

      public GatherMap conditionalLeftSemiJoinGatherMap(Table rightTable, CompiledExpression condition, long outputRowCount)
      Computes the gather map that can be used to manifest the result of a left semi join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table. The GatherMap instance returned can be used to gather the left table to produce the result of the left semi join. It is the responsibility of the caller to close the resulting gather map instance. This interface allows passing an output row count that was previously computed from conditionalLeftSemiJoinRowCount(Table, CompiledExpression). WARNING: Passing a row count that is smaller than the actual row count will result in undefined behavior.
      Parameters:
      rightTable - the right side table of the join
      condition - conditional expression to evaluate during the join
      outputRowCount - number of output rows in the join result
      Returns:
      left table gather map
    • mixedLeftSemiJoinGatherMap

      public static GatherMap mixedLeftSemiJoinGatherMap(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
      Computes the gather map that can be used to manifest the result of a left semi join between two tables using a mix of equality and inequality conditions. The entire join condition is assumed to be a logical AND of the equality condition and inequality condition. A GatherMap instance will be returned that can be used to gather the left table to produce the result of the left semi join. It is the responsibility of the caller to close the resulting gather map instances.
      Parameters:
      leftKeys - the left table's key columns for the equality condition
      rightKeys - the right table's key columns for the equality condition
      leftConditional - the left table's columns needed to evaluate the inequality condition
      rightConditional - the right table's columns needed to evaluate the inequality condition
      condition - the inequality condition of the join
      nullEquality - whether nulls should compare as equal
      Returns:
      left and right table gather maps
    • leftAntiJoinGatherMap

      public GatherMap leftAntiJoinGatherMap(Table rightKeys, boolean compareNullsEqual)
      Computes the gather map that can be used to manifest the result of a left anti-join between two tables. It is assumed this table instance holds the key columns from the left table, and the table argument represents the key columns from the right table. The GatherMap instance returned can be used to gather the left table to produce the result of the left anti-join. It is the responsibility of the caller to close the resulting gather map instance.
      Parameters:
      rightKeys - join key columns from the right table
      compareNullsEqual - true if null key values should match otherwise false
      Returns:
      left table gather map
    • conditionalLeftAntiJoinRowCount

      public long conditionalLeftAntiJoinRowCount(Table rightTable, CompiledExpression condition)
      Computes the number of rows from the result of a left anti join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table.
      Parameters:
      rightTable - the right side table of the join in the join
      condition - conditional expression to evaluate during the join
      Returns:
      row count for the join result
    • conditionalLeftAntiJoinGatherMap

      public GatherMap conditionalLeftAntiJoinGatherMap(Table rightTable, CompiledExpression condition)
      Computes the gather map that can be used to manifest the result of a left anti join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table. The GatherMap instance returned can be used to gather the left table to produce the result of the left anti join. It is the responsibility of the caller to close the resulting gather map instance.
      Parameters:
      rightTable - the right side table of the join
      condition - conditional expression to evaluate during the join
      Returns:
      left table gather map
    • conditionalLeftAntiJoinGatherMap

      public GatherMap conditionalLeftAntiJoinGatherMap(Table rightTable, CompiledExpression condition, long outputRowCount)
      Computes the gather map that can be used to manifest the result of a left anti join between two tables when a conditional expression is true. It is assumed this table instance holds the columns from the left table, and the table argument represents the columns from the right table. The GatherMap instance returned can be used to gather the left table to produce the result of the left anti join. It is the responsibility of the caller to close the resulting gather map instance. This interface allows passing an output row count that was previously computed from conditionalLeftAntiJoinRowCount(Table, CompiledExpression). WARNING: Passing a row count that is smaller than the actual row count will result in undefined behavior.
      Parameters:
      rightTable - the right side table of the join
      condition - conditional expression to evaluate during the join
      outputRowCount - number of output rows in the join result
      Returns:
      left table gather map
    • mixedLeftAntiJoinGatherMap

      public static GatherMap mixedLeftAntiJoinGatherMap(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
      Computes the gather map that can be used to manifest the result of a left anti join between two tables using a mix of equality and inequality conditions. The entire join condition is assumed to be a logical AND of the equality condition and inequality condition. A GatherMap instance will be returned that can be used to gather the left table to produce the result of the left anti join. It is the responsibility of the caller to close the resulting gather map instances.
      Parameters:
      leftKeys - the left table's key columns for the equality condition
      rightKeys - the right table's key columns for the equality condition
      leftConditional - the left table's columns needed to evaluate the inequality condition
      rightConditional - the right table's columns needed to evaluate the inequality condition
      condition - the inequality condition of the join
      nullEquality - whether nulls should compare as equal
      Returns:
      left and right table gather maps
    • fromPackedTable

      public static Table fromPackedTable(ByteBuffer metadata, DeviceMemoryBuffer data)
      Construct a table from a packed representation.
      Parameters:
      metadata - host-based metadata for the table
      data - GPU data buffer for the table
      Returns:
      table which is zero-copy reconstructed from the packed-form
    • sample

      public Table sample(long n, boolean replacement, long seed)
      Gather `n` samples from table randomly Note: does not preserve the ordering Example: input: {col1: {1, 2, 3, 4, 5}, col2: {6, 7, 8, 9, 10}} n: 3 replacement: false output: {col1: {3, 1, 4}, col2: {8, 6, 9}} replacement: true output: {col1: {3, 1, 1}, col2: {8, 6, 6}} throws "logic_error" if `n` > table rows and `replacement` == FALSE. throws "logic_error" if `n` < 0.
      Parameters:
      n - non-negative number of samples expected from table
      replacement - Allow or disallow sampling of the same row more than once.
      seed - Seed value to initiate random number generator.
      Returns:
      Table containing samples