Class HostColumnVector

java.lang.Object
ai.rapids.cudf.HostColumnVectorCore
ai.rapids.cudf.HostColumnVector
All Implemented Interfaces:
AutoCloseable

public final class HostColumnVector extends HostColumnVectorCore
Similar to a ColumnVector, but the data is stored in host memory and accessible directly from the JVM. This class holds references to off heap memory and is reference counted to know when to release it. Call close to decrement the reference count when you are done with the column, and call incRefCount to increment the reference count.
  • Constructor Details

    • HostColumnVector

      public HostColumnVector(DType type, long rows, Optional<Long> nullCount, HostMemoryBuffer hostDataBuffer, HostMemoryBuffer hostValidityBuffer, HostMemoryBuffer offsetBuffer, List<HostColumnVectorCore> nestedHcv)
      Create a new column vector with data populated on the host.
      Parameters:
      type - the type of the vector
      rows - the number of rows in the vector.
      nullCount - the number of nulls in the vector.
      hostDataBuffer - The host side data for the vector. In the case of STRING this is the string data stored as bytes.
      hostValidityBuffer - Arrow-like validity buffer 1 bit per row, with padding for 64-bit alignment.
      offsetBuffer - only valid for STRING this is the offsets into the hostDataBuffer indicating the start and end of a string entry. It should be (rows + 1) ints.
      nestedHcv - list of child HostColumnVectorCore(s) for complex types
  • Method Details

    • setEventHandler

      public HostColumnVector.EventHandler setEventHandler(HostColumnVector.EventHandler newHandler)
      Set an event handler for this host vector. This method can be invoked with null to unset the handler.
      Parameters:
      newHandler - - the EventHandler to use from this point forward
      Returns:
      the prior event handler, or null if not set.
    • getEventHandler

      public HostColumnVector.EventHandler getEventHandler()
      Returns the current event handler for this HostColumnVector or null if no handler is associated.
    • noWarnLeakExpected

      public void noWarnLeakExpected()
      This is a really ugly API, but it is possible that the lifecycle of a column of data may not have a clear lifecycle thanks to java and GC. This API informs the leak tracking code that this is expected for this column, and big scary warnings should not be printed when this happens.
    • close

      public void close()
      Close this Vector and free memory allocated for HostMemoryBuffer and DeviceMemoryBuffer
      Specified by:
      close in interface AutoCloseable
      Overrides:
      close in class HostColumnVectorCore
    • toString

      public String toString()
      Overrides:
      toString in class HostColumnVectorCore
    • incRefCount

      public HostColumnVector incRefCount()
      Increment the reference count for this column. You need to call close on this to decrement the reference count again.
    • getRefCount

      public int getRefCount()
      Returns this column's current refcount
    • copyToDevice

      public ColumnVector copyToDevice()
      Copy the data to the device.
    • builder

      public static HostColumnVector.Builder builder(DType type, int rows)
      Create a new Builder to hold the specified number of rows. Be sure to close the builder when done with it. Please try to use instead to avoid needing to close the builder.
      Parameters:
      type - the type of vector to build.
      rows - the number of rows this builder can hold
      Returns:
      the builder to use.
    • builder

      public static HostColumnVector.Builder builder(int rows, long stringBufferSize)
      Create a new Builder to hold the specified number of rows and with enough space to hold the given amount of string data. Be sure to close the builder when done with it. Please try to use instead to avoid needing to close the builder.
      Parameters:
      rows - the number of rows this builder can hold
      stringBufferSize - the size of the string buffer to allocate.
      Returns:
      the builder to use.
    • build

      public static HostColumnVector build(DType type, int rows, Consumer<HostColumnVector.Builder> init)
      Create a new vector.
      Parameters:
      type - the type of vector to build.
      rows - maximum number of rows that the vector can hold.
      init - what will initialize the vector.
      Returns:
      the created vector.
    • build

      public static HostColumnVector build(int rows, long stringBufferSize, Consumer<HostColumnVector.Builder> init)
    • fromLists

      public static <T> HostColumnVector fromLists(HostColumnVector.DataType dataType, List<T>... values)
    • fromStructs

      public static HostColumnVector fromStructs(HostColumnVector.DataType dataType, List<HostColumnVector.StructData> values)
    • fromStructs

      public static HostColumnVector fromStructs(HostColumnVector.DataType dataType, HostColumnVector.StructData... values)
    • emptyStructs

      public static HostColumnVector emptyStructs(HostColumnVector.DataType dataType, long rows)
    • boolFromBytes

      public static HostColumnVector boolFromBytes(byte... values)
      Create a new vector from the given values.
    • fromBytes

      public static HostColumnVector fromBytes(byte... values)
      Create a new vector from the given values.
    • fromUnsignedBytes

      public static HostColumnVector fromUnsignedBytes(byte... values)
      Create a new vector from the given values.

      Java does not have an unsigned byte type, so the values will be treated as if the bits represent an unsigned value.

    • fromShorts

      public static HostColumnVector fromShorts(short... values)
      Create a new vector from the given values.
    • fromUnsignedShorts

      public static HostColumnVector fromUnsignedShorts(short... values)
      Create a new vector from the given values.

      Java does not have an unsigned short type, so the values will be treated as if the bits represent an unsigned value.

    • durationNanosecondsFromLongs

      public static HostColumnVector durationNanosecondsFromLongs(long... values)
      Create a new vector from the given values.
    • durationMicrosecondsFromLongs

      public static HostColumnVector durationMicrosecondsFromLongs(long... values)
      Create a new vector from the given values.
    • durationMillisecondsFromLongs

      public static HostColumnVector durationMillisecondsFromLongs(long... values)
      Create a new vector from the given values.
    • durationSecondsFromLongs

      public static HostColumnVector durationSecondsFromLongs(long... values)
      Create a new vector from the given values.
    • durationDaysFromInts

      public static HostColumnVector durationDaysFromInts(int... values)
      Create a new vector from the given values.
    • fromInts

      public static HostColumnVector fromInts(int... values)
      Create a new vector from the given values.
    • fromUnsignedInts

      public static HostColumnVector fromUnsignedInts(int... values)
      Create a new vector from the given values.

      Java does not have an unsigned int type, so the values will be treated as if the bits represent an unsigned value.

    • fromLongs

      public static HostColumnVector fromLongs(long... values)
      Create a new vector from the given values.
    • fromUnsignedLongs

      public static HostColumnVector fromUnsignedLongs(long... values)
      Create a new vector from the given values.

      Java does not have an unsigned long type, so the values will be treated as if the bits represent an unsigned value.

    • fromFloats

      public static HostColumnVector fromFloats(float... values)
      Create a new vector from the given values.
    • fromDoubles

      public static HostColumnVector fromDoubles(double... values)
      Create a new vector from the given values.
    • daysFromInts

      public static HostColumnVector daysFromInts(int... values)
      Create a new vector from the given values.
    • timestampSecondsFromLongs

      public static HostColumnVector timestampSecondsFromLongs(long... values)
      Create a new vector from the given values.
    • timestampMilliSecondsFromLongs

      public static HostColumnVector timestampMilliSecondsFromLongs(long... values)
      Create a new vector from the given values.
    • timestampMicroSecondsFromLongs

      public static HostColumnVector timestampMicroSecondsFromLongs(long... values)
      Create a new vector from the given values.
    • timestampNanoSecondsFromLongs

      public static HostColumnVector timestampNanoSecondsFromLongs(long... values)
      Create a new vector from the given values.
    • decimalFromInts

      public static HostColumnVector decimalFromInts(int scale, int... values)
      Create a new decimal vector from unscaled values (int array) and scale. The created vector is of type DType.DECIMAL32, whose max precision is 9. Compared with scale of [[java.math.BigDecimal]], the scale here represents the opposite meaning.
    • decimalFromBoxedInts

      public static HostColumnVector decimalFromBoxedInts(int scale, Integer... values)
      Create a new decimal vector from boxed unscaled values (Integer array) and scale. The created vector is of type DType.DECIMAL32, whose max precision is 9. Compared with scale of [[java.math.BigDecimal]], the scale here represents the opposite meaning.
    • decimalFromLongs

      public static HostColumnVector decimalFromLongs(int scale, long... values)
      Create a new decimal vector from unscaled values (long array) and scale. The created vector is of type DType.DECIMAL64, whose max precision is 18. Compared with scale of [[java.math.BigDecimal]], the scale here represents the opposite meaning.
    • decimalFromBoxedLongs

      public static HostColumnVector decimalFromBoxedLongs(int scale, Long... values)
      Create a new decimal vector from boxed unscaled values (Long array) and scale. The created vector is of type DType.DECIMAL64, whose max precision is 18. Compared with scale of [[java.math.BigDecimal]], the scale here represents the opposite meaning.
    • decimalFromBigIntegers

      public static HostColumnVector decimalFromBigIntegers(int scale, BigInteger... values)
      Create a new decimal vector from unscaled values (BigInteger array) and scale. The created vector is of type DType.DECIMAL128. Compared with scale of [[java.math.BigDecimal]], the scale here represents the opposite meaning.
    • decimalFromDoubles

      public static HostColumnVector decimalFromDoubles(DType type, RoundingMode mode, double... values)
      Create a new decimal vector from double floats with specific DecimalType and RoundingMode. All doubles will be rescaled if necessary, according to scale of input DecimalType and RoundingMode. If any overflow occurs in extracting integral part, an IllegalArgumentException will be thrown. This API is inefficient because of slow double -> decimal conversion, so it is mainly for testing. Compared with scale of [[java.math.BigDecimal]], the scale here represents the opposite meaning.
    • fromStrings

      public static HostColumnVector fromStrings(String... values)
      Create a new string vector from the given values. This API supports inline nulls. This is really intended to be used only for testing as it is slow and memory intensive to translate between java strings and UTF8 strings.
    • fromUTF8Strings

      public static HostColumnVector fromUTF8Strings(byte[]... values)
      Create a new string vector from the given values. This API supports inline nulls.
    • fromDecimals

      public static HostColumnVector fromDecimals(BigDecimal... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than building from primitive array of unscaledValues. Notice: 1. Input values will be rescaled with min scale (max scale in terms of java.math.BigDecimal), which avoids potential precision loss due to rounding. But there exists risk of precision overflow. 2. The scale will be zero if all input values are null.
    • fromBoxedBooleans

      public static HostColumnVector fromBoxedBooleans(Boolean... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.
    • fromBoxedBytes

      public static HostColumnVector fromBoxedBytes(Byte... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.
    • fromBoxedUnsignedBytes

      public static HostColumnVector fromBoxedUnsignedBytes(Byte... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.

      Java does not have an unsigned byte type, so the values will be treated as if the bits represent an unsigned value.

    • fromBoxedShorts

      public static HostColumnVector fromBoxedShorts(Short... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.
    • fromBoxedUnsignedShorts

      public static HostColumnVector fromBoxedUnsignedShorts(Short... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.

      Java does not have an unsigned short type, so the values will be treated as if the bits represent an unsigned value.

    • durationNanosecondsFromBoxedLongs

      public static HostColumnVector durationNanosecondsFromBoxedLongs(Long... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.
    • durationMicrosecondsFromBoxedLongs

      public static HostColumnVector durationMicrosecondsFromBoxedLongs(Long... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.
    • durationMillisecondsFromBoxedLongs

      public static HostColumnVector durationMillisecondsFromBoxedLongs(Long... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.
    • durationSecondsFromBoxedLongs

      public static HostColumnVector durationSecondsFromBoxedLongs(Long... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.
    • durationDaysFromBoxedInts

      public static HostColumnVector durationDaysFromBoxedInts(Integer... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.
    • fromBoxedInts

      public static HostColumnVector fromBoxedInts(Integer... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.
    • fromBoxedUnsignedInts

      public static HostColumnVector fromBoxedUnsignedInts(Integer... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.

      Java does not have an unsigned int type, so the values will be treated as if the bits represent an unsigned value.

    • fromBoxedLongs

      public static HostColumnVector fromBoxedLongs(Long... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.
    • fromBoxedUnsignedLongs

      public static HostColumnVector fromBoxedUnsignedLongs(Long... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.

      Java does not have an unsigned long type, so the values will be treated as if the bits represent an unsigned value.

    • fromBoxedFloats

      public static HostColumnVector fromBoxedFloats(Float... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.
    • fromBoxedDoubles

      public static HostColumnVector fromBoxedDoubles(Double... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.
    • timestampDaysFromBoxedInts

      public static HostColumnVector timestampDaysFromBoxedInts(Integer... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.
    • timestampSecondsFromBoxedLongs

      public static HostColumnVector timestampSecondsFromBoxedLongs(Long... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.
    • timestampMilliSecondsFromBoxedLongs

      public static HostColumnVector timestampMilliSecondsFromBoxedLongs(Long... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.
    • timestampMicroSecondsFromBoxedLongs

      public static HostColumnVector timestampMicroSecondsFromBoxedLongs(Long... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.
    • timestampNanoSecondsFromBoxedLongs

      public static HostColumnVector timestampNanoSecondsFromBoxedLongs(Long... values)
      Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.