Class ColumnView

java.lang.Object
ai.rapids.cudf.ColumnView
All Implemented Interfaces:
BinaryOperable, AutoCloseable
Direct Known Subclasses:
ColumnVector

public class ColumnView extends Object implements AutoCloseable, BinaryOperable
This class represents the column_view of a column analogous to its cudf cpp counterpart. It holds view information like the native handle and other metadata for a column_view. It also exposes APIs that would allow operations on a view.
  • Field Details

    • UNKNOWN_NULL_COUNT

      public static final long UNKNOWN_NULL_COUNT
      See Also:
    • viewHandle

      protected long viewHandle
    • type

      protected final DType type
    • rows

      protected final long rows
    • nullCount

      protected final long nullCount
    • offHeap

      protected final ColumnVector.OffHeapState offHeap
  • Constructor Details

    • ColumnView

      protected ColumnView(ColumnVector.OffHeapState state)
      Intended to be called from ColumnVector when it is being constructed. Because state creates a cudf::column_view instance and will close it in all cases, we don't want to have to double close it. This asserts that if the offHeapState is of nested-type it doesn't contain non-empty nulls
      Parameters:
      state - the state this view is based off of.
      Throws:
      AssertionError - if offHeapState points to a nested-type view with non-empty nulls
    • ColumnView

      public ColumnView(DType type, long rows, Optional<Long> nullCount, BaseDeviceMemoryBuffer validityBuffer, BaseDeviceMemoryBuffer offsetBuffer, ColumnView[] children)
      Create a new column view based off of data already on the device. Ref count on the buffers is not incremented and none of the underlying buffers are owned by this view. The returned ColumnView is only valid as long as the underlying buffers remain valid. If the buffers are closed before this ColumnView is closed, it will result in undefined behavior. If ownership is needed, call copyToColumnVector()
      Parameters:
      type - the type of the vector
      rows - the number of rows in this vector.
      nullCount - the number of nulls in the dataset.
      validityBuffer - an optional validity buffer. Must be provided if nullCount != 0. The ownership doesn't change on this buffer
      offsetBuffer - a host buffer required for nested types including strings and string categories. The ownership doesn't change on this buffer
      children - an array of ColumnView children
    • ColumnView

      public ColumnView(DType type, long rows, Optional<Long> nullCount, BaseDeviceMemoryBuffer dataBuffer, BaseDeviceMemoryBuffer validityBuffer)
      Create a new column view based off of data already on the device. Ref count on the buffers is not incremented and none of the underlying buffers are owned by this view. The returned ColumnView is only valid as long as the underlying buffers remain valid. If the buffers are closed before this ColumnView is closed, it will result in undefined behavior. If ownership is needed, call copyToColumnVector()
      Parameters:
      type - the type of the vector
      rows - the number of rows in this vector.
      nullCount - the number of nulls in the dataset.
      dataBuffer - a host buffer required for nested types including strings and string categories. The ownership doesn't change on this buffer
      validityBuffer - an optional validity buffer. Must be provided if nullCount != 0. The ownership doesn't change on this buffer
    • ColumnView

      public ColumnView(DType type, long rows, Optional<Long> nullCount, BaseDeviceMemoryBuffer dataBuffer, BaseDeviceMemoryBuffer validityBuffer, BaseDeviceMemoryBuffer offsetBuffer)
      Create a new column view based off of data already on the device. Ref count on the buffers is not incremented and none of the underlying buffers are owned by this view. The returned ColumnView is only valid as long as the underlying buffers remain valid. If the buffers are closed before this ColumnView is closed, it will result in undefined behavior. If ownership is needed, call copyToColumnVector()
      Parameters:
      type - the type of the vector
      rows - the number of rows in this vector.
      nullCount - the number of nulls in the dataset.
      dataBuffer - a host buffer required for nested types including strings and string categories. The ownership doesn't change on this buffer
      validityBuffer - an optional validity buffer. Must be provided if nullCount != 0. The ownership doesn't change on this buffer
      offsetBuffer - The offsetbuffer for columns that need an offset buffer
  • Method Details

    • copyToColumnVector

      public ColumnVector copyToColumnVector()
      Creates a ColumnVector from a column view handle
      Returns:
      a new ColumnVector
    • getNativeView

      public final long getNativeView()
      USE WITH CAUTION: This method exposes the address of the native cudf::column_view. This allows writing custom kernels or other cuda operations on the data. DO NOT close this column vector until you are completely done using the native column_view. DO NOT modify the column in any way. This should be treated as a read only data structure. This API is unstable as the underlying C/C++ API is still not stabilized. If the underlying data structure is renamed this API may be replaced. The underlying data structure can change from release to release (it is not stable yet) so be sure that your native code is complied against the exact same version of libcudf as this is released for.
    • getType

      public final DType getType()
      Description copied from interface: BinaryOperable
      Get the type of this data.
      Specified by:
      getType in interface BinaryOperable
    • getChildColumnViews

      public final ColumnView[] getChildColumnViews()
      Returns the child column views for this view Please note that it is the responsibility of the caller to close these views.
      Returns:
      an array of child column views
    • getChildColumnView

      public final ColumnView getChildColumnView(int childIndex)
      Returns the child column view at a given index. Please note that it is the responsibility of the caller to close this view.
      Parameters:
      childIndex - the index of the child
      Returns:
      a column view
    • getListOffsetsView

      public ColumnView getListOffsetsView()
      Get a ColumnView that is the offsets for this list. Please note that it is the responsibility of the caller to close this view, and the parent column must out live this view.
    • getData

      public final BaseDeviceMemoryBuffer getData()
      Gets the data buffer for the current column view (viewHandle). If the type is LIST, STRUCT it returns null.
      Returns:
      If the type is LIST, STRUCT or data buffer is empty it returns null, else return the data device buffer
    • getOffsets

      public final BaseDeviceMemoryBuffer getOffsets()
    • getValid

      public final BaseDeviceMemoryBuffer getValid()
    • getNullCount

      public long getNullCount()
      Returns the number of nulls in the data. Note that this might end up being a very expensive operation because if the null count is not known it will be calculated.
    • getRowCount

      public final long getRowCount()
      Returns the number of rows in this vector.
    • getNumChildren

      public final int getNumChildren()
    • getDeviceMemorySize

      public long getDeviceMemorySize()
      Returns the amount of device memory used.
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • nansToNulls

      public final ColumnVector nansToNulls()
      Returns a new ColumnVector with NaNs converted to nulls, preserving the existing null values.
    • getCharLengths

      public final ColumnVector getCharLengths()
      Retrieve the number of characters in each string. Null strings will have value of null.
      Returns:
      ColumnVector holding length of string at index 'i' in the original vector
    • getByteCount

      public final ColumnVector getByteCount()
      Retrieve the number of bytes for each string. Null strings will have value of null.
      Returns:
      ColumnVector, where each element at i = byte count of string at index 'i' in the original vector
    • codePoints

      public final ColumnVector codePoints()
      Get the code point values (integers) for each character of each string.
      Returns:
      ColumnVector, with code point integer values for each character as INT32
    • countElements

      public final ColumnVector countElements()
      Get the number of elements for each list. Null lists will have a value of null.
      Returns:
      the number of elements in each list as an INT32 value.
    • isNotNull

      public final ColumnVector isNotNull()
      Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is not null, and FALSE for any null entry (as per the validity mask)
      Returns:
      - Boolean vector
    • isNull

      public final ColumnVector isNull()
      Returns a Boolean vector with the same number of rows as this instance, that has FALSE for any entry that is not null, and TRUE for any null entry (as per the validity mask)
      Returns:
      - Boolean vector
    • isFixedPoint

      public final ColumnVector isFixedPoint(DType decimalType)
      Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is a fixed-point, and FALSE if its not a fixed-point. A null will be returned for null entries. The sign and the exponent is optional. The decimal point may only appear once. The integer component must fit within the size limits of the underlying fixed-point storage type. The value of the integer component is based on the scale of the target decimalType. Example: vec = ["A", "nan", "Inf", "-Inf", "Infinity", "infinity", "2.1474", "112.383", "-2.14748", "NULL", "null", null, "1.2", "1.2e-4", "0.00012"] vec.isFixedPoint() = [false, false, false, false, false, false, true, true, true, false, false, null, true, true, true]
      Parameters:
      decimalType - the data type that should be used for bounds checking. Note that only Decimal types (fixed-point) are allowed.
      Returns:
      Boolean vector
    • isInteger

      public final ColumnVector isInteger()
      Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is an integer, and FALSE if its not an integer. A null will be returned for null entries. NOTE: Integer doesn't mean a 32-bit integer. It means a number that is not a fraction. i.e. If this method returns true for a value it could still result in an overflow or underflow if you convert it to a Java integral type
      Returns:
      Boolean vector
    • isInteger

      public final ColumnVector isInteger(DType intType)
      Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is an integer, and FALSE if its not an integer. A null will be returned for null entries.
      Parameters:
      intType - the data type that should be used for bounds checking. Note that only cudf integer types are allowed including signed/unsigned int8 through int64
      Returns:
      Boolean vector
    • isFloat

      public final ColumnVector isFloat()
      Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is a float, and FALSE if its not a float. A null will be returned for null entries NOTE: Float doesn't mean a 32-bit float. It means a number that is a fraction or can be written as a fraction. i.e. This method will return true for integers as well as floats. Also note if this method returns true for a value it could still result in an overflow or underflow if you convert it to a Java float or double
      Returns:
      - Boolean vector
    • isNan

      public final ColumnVector isNan()
      Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is NaN, and FALSE if null or a valid floating point value
      Returns:
      - Boolean vector
    • isNotNan

      public final ColumnVector isNotNan()
      Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is null or a valid floating point value, FALSE otherwise
      Returns:
      - Boolean vector
    • findAndReplaceAll

      public final ColumnVector findAndReplaceAll(ColumnView oldValues, ColumnView newValues)
      Returns a vector with all values "oldValues[i]" replaced with "newValues[i]". Warning: Currently this function doesn't work for Strings or StringCategories. NaNs can't be replaced in the original vector but regular values can be replaced with NaNs Nulls can't be replaced in the original vector but regular values can be replaced with Nulls Mixing of types isn't allowed, the resulting vector will be the same type as the original. e.g. You can't replace an integer vector with values from a long vector Usage: this = {1, 4, 5, 1, 5} oldValues = {1, 5, 7} newValues = {2, 6, 9} result = this.findAndReplaceAll(oldValues, newValues); result = {2, 4, 6, 2, 6} (1 and 5 replaced with 2 and 6 but 7 wasn't found so no change)
      Parameters:
      oldValues - - A vector containing values that should be replaced
      newValues - - A vector containing new values
      Returns:
      - A new vector containing the old values replaced with new values
    • replaceNulls

      public final ColumnVector replaceNulls(Scalar scalar)
      Returns a ColumnVector with any null values replaced with a scalar. The types of the input ColumnVector and Scalar must match, else an error is thrown.
      Parameters:
      scalar - - Scalar value to use as replacement
      Returns:
      - ColumnVector with nulls replaced by scalar
    • replaceNulls

      public final ColumnVector replaceNulls(ColumnView replacements)
      Returns a ColumnVector with any null values replaced with the corresponding row in the specified replacement column. This column and the replacement column must have the same type and number of rows.
      Parameters:
      replacements - column of replacement values
      Returns:
      column with nulls replaced by corresponding row of replacements column
    • replaceNulls

      public final ColumnVector replaceNulls(ReplacePolicy policy)
    • ifElse

      public final ColumnVector ifElse(ColumnView trueValues, ColumnView falseValues)
      For a BOOL8 vector, computes a vector whose rows are selected from two other vectors based on the boolean value of this vector in the corresponding row. If the boolean value in a row is true, the corresponding row is selected from trueValues otherwise the corresponding row from falseValues is selected. Note that trueValues and falseValues vectors must be the same length as this vector, and trueValues and falseValues must have the same data type.
      Parameters:
      trueValues - the values to select if a row in this column is true
      falseValues - the values to select if a row in this column is not true
      Returns:
      the computed vector
    • ifElse

      public final ColumnVector ifElse(ColumnView trueValues, Scalar falseValue)
      For a BOOL8 vector, computes a vector whose rows are selected from two other inputs based on the boolean value of this vector in the corresponding row. If the boolean value in a row is true, the corresponding row is selected from trueValues otherwise the value from falseValue is selected. Note that trueValues must be the same length as this vector, and trueValues and falseValue must have the same data type. Note that the trueValues vector and falseValue scalar must have the same data type.
      Parameters:
      trueValues - the values to select if a row in this column is true
      falseValue - the value to select if a row in this column is not true
      Returns:
      the computed vector
    • ifElse

      public final ColumnVector ifElse(Scalar trueValue, ColumnView falseValues)
      For a BOOL8 vector, computes a vector whose rows are selected from two other inputs based on the boolean value of this vector in the corresponding row. If the boolean value in a row is true, the value from trueValue is selected otherwise the corresponding row from falseValues is selected. Note that falseValues must be the same length as this vector, and trueValue and falseValues must have the same data type. Note that the trueValue scalar and falseValues vector must have the same data type.
      Parameters:
      trueValue - the value to select if a row in this column is true
      falseValues - the values to select if a row in this column is not true
      Returns:
      the computed vector
    • ifElse

      public final ColumnVector ifElse(Scalar trueValue, Scalar falseValue)
      For a BOOL8 vector, computes a vector whose rows are selected from two other inputs based on the boolean value of this vector in the corresponding row. If the boolean value in a row is true, the value from trueValue is selected otherwise the value from falseValue is selected. Note that the trueValue and falseValue scalars must have the same data type.
      Parameters:
      trueValue - the value to select if a row in this column is true
      falseValue - the value to select if a row in this column is not true
      Returns:
      the computed vector
    • slice

      public final ColumnVector[] slice(int... indices)
      Slices a column (including null values) into a set of columns according to a set of indices. The caller owns the ColumnVectors and is responsible closing them The "slice" function divides part of the input column into multiple intervals of rows using the indices values and it stores the intervals into the output columns. Regarding the interval of indices, a pair of values are taken from the indices array in a consecutive manner. The pair of indices are left-closed and right-open. The pairs of indices in the array are required to comply with the following conditions: a, b belongs to Range[0, input column size] a <= b, where the position of a is less or equal to the position of b. Exceptional cases for the indices array are: When the values in the pair are equal, the function returns an empty column. When the values in the pair are 'strictly decreasing', the outcome is undefined. When any of the values in the pair don't belong to the range[0, input column size), the outcome is undefined. When the indices array is empty, an empty vector of columns is returned. The caller owns the output ColumnVectors and is responsible for closing them.
      Parameters:
      indices -
      Returns:
      A new ColumnVector array with slices from the original ColumnVector
    • subVector

      public final ColumnVector subVector(int start)
      Return a subVector from start inclusive to the end of the vector.
      Parameters:
      start - the index to start at.
    • subVector

      public final ColumnVector subVector(int start, int end)
      Return a subVector.
      Parameters:
      start - the index to start at (inclusive).
      end - the index to end at (exclusive).
    • split

      public final ColumnVector[] split(int... indices)
      Splits a column (including null values) into a set of columns according to a set of indices. The caller owns the ColumnVectors and is responsible closing them. The "split" function divides the input column into multiple intervals of rows using the splits indices values and it stores the intervals into the output columns. Regarding the interval of indices, a pair of values are taken from the indices array in a consecutive manner. The pair of indices are left-closed and right-open. The indices array ('splits') is require to be a monotonic non-decreasing set. The indices in the array are required to comply with the following conditions: a, b belongs to Range[0, input column size] a <= b, where the position of a is less or equal to the position of b. The split function will take a pair of indices from the indices array ('splits') in a consecutive manner. For the first pair, the function will take the value 0 and the first element of the indices array. For the last pair, the function will take the last element of the indices array and the size of the input column. Exceptional cases for the indices array are: When the values in the pair are equal, the function return an empty column. When the values in the pair are 'strictly decreasing', the outcome is undefined. When any of the values in the pair don't belong to the range[0, input column size), the outcome is undefined. When the indices array is empty, an empty vector of columns is returned. The input columns may have different sizes. The number of columns must be equal to the number of indices in the array plus one. Example: input: {10, 12, 14, 16, 18, 20, 22, 24, 26, 28} splits: {2, 5, 9} output: {{10, 12}, {14, 16, 18}, {20, 22, 24, 26}, {28}} Note that this is very similar to the output from a PartitionedTable.
      Parameters:
      indices - the indexes to split with
      Returns:
      A new ColumnVector array with slices from the original ColumnVector
    • splitAsViews

      public ColumnView[] splitAsViews(int... indices)
      Splits a ColumnView (including null values) into a set of ColumnViews according to a set of indices. No data is moved or copied. IMPORTANT NOTE: Nothing is copied out from the vector and the slices will only be relevant for the lifecycle of the underlying ColumnVector. The "split" function divides the input column into multiple intervals of rows using the splits indices values and it stores the intervals into the output columns. Regarding the interval of indices, a pair of values are taken from the indices array in a consecutive manner. The pair of indices are left-closed and right-open. The indices array ('splits') is required to be a monotonic non-decreasing set. The indices in the array are required to comply with the following conditions: a, b belongs to Range[0, input column size] a <= b, where the position of 'a' is less or equal to the position of 'b'. The split function will take a pair of indices from the indices array ('splits') in a consecutive manner. For the first pair, the function will take the value 0 and the first element of the indices array. For the last pair, the function will take the last element of the indices array and the size of the input column. Exceptional cases for the indices array are: When the values in the pair are equal, the function return an empty column. When the values in the pair are 'strictly decreasing', the outcome is undefined. When any of the values in the pair don't belong to the range[0, input column size), the outcome is undefined. When the indices array is empty, an empty array of ColumnViews is returned. The output columns may have different sizes. The number of columns must be equal to the number of indices in the array plus one. Example: input: {10, 12, 14, 16, 18, 20, 22, 24, 26, 28} splits: {2, 5, 9} output: {{10, 12}, {14, 16, 18}, {20, 22, 24, 26}, {28}} Note that this is very similar to the output from a PartitionedTable.
      Parameters:
      indices - the indices to split with
      Returns:
      A new ColumnView array with slices from the original ColumnView
    • normalizeNANsAndZeros

      public final ColumnVector normalizeNANsAndZeros()
      Create a new vector of "normalized" values, where: 1. All representations of NaN (and -NaN) are replaced with the normalized NaN value 2. All elements equivalent to 0.0 (including +0.0 and -0.0) are replaced with +0.0. 3. All elements that are not equivalent to NaN or 0.0 remain unchanged. The documentation for Double.longBitsToDouble(long) describes how equivalent values of NaN/-NaN might have different bitwise representations. This method may be used to compare different bitwise values of 0.0 or NaN as logically equivalent. For instance, if these values appear in a groupby key column, without normalization 0.0 and -0.0 would be erroneously treated as distinct groups, as will each representation of NaN.
      Returns:
      A new ColumnVector with all elements equivalent to NaN/0.0 replaced with a normalized equivalent.
    • mergeAndSetValidity

      public final ColumnVector mergeAndSetValidity(BinaryOp mergeOp, ColumnView... columns)
      Create a deep copy of the column while replacing the null mask. The resultant null mask is the bitwise merge of null masks in the columns given as arguments. The result will be sanitized to not contain any non-empty nulls in case of nested types
      Parameters:
      mergeOp - binary operator (BITWISE_AND and BITWISE_OR only)
      columns - array of columns whose null masks are merged, must have identical number of rows.
      Returns:
      the new ColumnVector with merged null mask.
    • extractDateTimeComponent

      public final ColumnVector extractDateTimeComponent(DateTimeComponent component)
      Extract a particular date time component from a timestamp.
      Parameters:
      component - what should be extracted
      Returns:
      a column with the extracted information in it.
    • year

      public final ColumnVector year()
      Get year from a timestamp.

      Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.

      Returns:
      - A new INT16 vector allocated on the GPU.
    • month

      public final ColumnVector month()
      Get month from a timestamp.

      Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.

      Returns:
      - A new INT16 vector allocated on the GPU.
    • day

      public final ColumnVector day()
      Get day from a timestamp.

      Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.

      Returns:
      - A new INT16 vector allocated on the GPU.
    • hour

      public final ColumnVector hour()
      Get hour from a timestamp with time resolution.

      Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.

      Returns:
      - A new INT16 vector allocated on the GPU.
    • minute

      public final ColumnVector minute()
      Get minute from a timestamp with time resolution.

      Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.

      Returns:
      - A new INT16 vector allocated on the GPU.
    • second

      public final ColumnVector second()
      Get second from a timestamp with time resolution.

      Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.

      Returns:
      A new INT16 vector allocated on the GPU.
    • weekDay

      public final ColumnVector weekDay()
      Get the day of the week from a timestamp.

      Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.

      Returns:
      A new INT16 vector allocated on the GPU. Monday=1, ..., Sunday=7
    • lastDayOfMonth

      public final ColumnVector lastDayOfMonth()
      Get the date that is the last day of the month for this timestamp.

      Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.

      Returns:
      A new TIMESTAMP_DAYS vector allocated on the GPU.
    • dayOfYear

      public final ColumnVector dayOfYear()
      Get the day of the year from a timestamp.

      Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.

      Returns:
      A new INT16 vector allocated on the GPU. The value is between [1, {365-366}]
    • quarterOfYear

      public final ColumnVector quarterOfYear()
      Get the quarter of the year from a timestamp.
      Returns:
      A new INT16 vector allocated on the GPU. It will be a value from {1, 2, 3, 4} corresponding to the quarter of the year.
    • addCalendricalMonths

      public final ColumnVector addCalendricalMonths(ColumnView months)
      Add the specified number of months to the timestamp.
      Parameters:
      months - must be a INT16 column indicating the number of months to add. A negative number of months works too.
      Returns:
      the updated timestamp
    • addCalendricalMonths

      public final ColumnVector addCalendricalMonths(Scalar months)
      Add the specified number of months to the timestamp.
      Parameters:
      months - must be a INT16 scalar indicating the number of months to add. A negative number of months works too.
      Returns:
      the updated timestamp
    • isLeapYear

      public final ColumnVector isLeapYear()
      Check to see if the year for this timestamp is a leap year or not.
      Returns:
      BOOL8 vector of results
    • daysInMonth

      public final ColumnVector daysInMonth()
      Extract the number of days in the month
      Returns:
      INT16 column of the number of days in the corresponding month
    • dateTimeCeil

      public final ColumnVector dateTimeCeil(DateTimeRoundingFrequency freq)
      Round the timestamp up to the given frequency keeping the type the same.
      Parameters:
      freq - what part of the timestamp to round.
      Returns:
      a timestamp with the same type, but rounded up.
    • dateTimeFloor

      public final ColumnVector dateTimeFloor(DateTimeRoundingFrequency freq)
      Round the timestamp down to the given frequency keeping the type the same.
      Parameters:
      freq - what part of the timestamp to round.
      Returns:
      a timestamp with the same type, but rounded down.
    • dateTimeRound

      public final ColumnVector dateTimeRound(DateTimeRoundingFrequency freq)
      Round the timestamp (half up) to the given frequency keeping the type the same.
      Parameters:
      freq - what part of the timestamp to round.
      Returns:
      a timestamp with the same type, but rounded (half up).
    • round

      public ColumnVector round(int decimalPlaces, RoundMode mode)
      Rounds all the values in a column to the specified number of decimal places.
      Parameters:
      decimalPlaces - Number of decimal places to round to. If negative, this specifies the number of positions to the left of the decimal point.
      mode - Rounding method(either HALF_UP or HALF_EVEN)
      Returns:
      a new ColumnVector with rounded values.
    • round

      public ColumnVector round(RoundMode round)
      Rounds all the values in a column with decimal places = 0. Default number of decimal places to round to is 0.
      Parameters:
      round - Rounding method(either HALF_UP or HALF_EVEN)
      Returns:
      a new ColumnVector with rounded values.
    • round

      public ColumnVector round(int decimalPlaces)
      Rounds all the values in a column to the specified number of decimal places with HALF_UP (default) as Rounding method.
      Parameters:
      decimalPlaces - Number of decimal places to round to. If negative, this specifies the number of positions to the left of the decimal point.
      Returns:
      a new ColumnVector with rounded values.
    • round

      public ColumnVector round()
      Rounds all the values in a column with these default values: decimalPlaces = 0 Rounding method = RoundMode.HALF_UP
      Returns:
      a new ColumnVector with rounded values.
    • transform

      public final ColumnVector transform(String udf, boolean isPtx)
      Transform a vector using a custom function. Be careful this is not simple to do. You need to be positive you know what type of data you are processing and how the data is laid out. This also only works on fixed length types.
      Parameters:
      udf - This function will be applied to every element in the vector
      isPtx - is the code of the function ptx? true or C/C++ false.
    • unaryOp

      public final ColumnVector unaryOp(UnaryOp op)
      Multiple different unary operations. The output is the same type as input.
      Parameters:
      op - the operation to perform
      Returns:
      the result
    • sin

      public final ColumnVector sin()
      Calculate the sin, output is the same type as input.
    • cos

      public final ColumnVector cos()
      Calculate the cos, output is the same type as input.
    • tan

      public final ColumnVector tan()
      Calculate the tan, output is the same type as input.
    • arcsin

      public final ColumnVector arcsin()
      Calculate the arcsin, output is the same type as input.
    • arccos

      public final ColumnVector arccos()
      Calculate the arccos, output is the same type as input.
    • arctan

      public final ColumnVector arctan()
      Calculate the arctan, output is the same type as input.
    • sinh

      public final ColumnVector sinh()
      Calculate the hyperbolic sin, output is the same type as input.
    • cosh

      public final ColumnVector cosh()
      Calculate the hyperbolic cos, output is the same type as input.
    • tanh

      public final ColumnVector tanh()
      Calculate the hyperbolic tan, output is the same type as input.
    • arcsinh

      public final ColumnVector arcsinh()
      Calculate the hyperbolic arcsin, output is the same type as input.
    • arccosh

      public final ColumnVector arccosh()
      Calculate the hyperbolic arccos, output is the same type as input.
    • arctanh

      public final ColumnVector arctanh()
      Calculate the hyperbolic arctan, output is the same type as input.
    • exp

      public final ColumnVector exp()
      Calculate the exp, output is the same type as input.
    • log

      public final ColumnVector log()
      Calculate the log, output is the same type as input.
    • log2

      public final ColumnVector log2()
      Calculate the log with base 2, output is the same type as input.
    • log10

      public final ColumnVector log10()
      Calculate the log with base 10, output is the same type as input.
    • sqrt

      public final ColumnVector sqrt()
      Calculate the sqrt, output is the same type as input.
    • cbrt

      public final ColumnVector cbrt()
      Calculate the cube root, output is the same type as input.
    • ceil

      public final ColumnVector ceil()
      Calculate the ceil, output is the same type as input.
    • floor

      public final ColumnVector floor()
      Calculate the floor, output is the same type as input.
    • abs

      public final ColumnVector abs()
      Calculate the abs, output is the same type as input.
    • rint

      public final ColumnVector rint()
      Rounds a floating-point argument to the closest integer value, but returns it as a float.
    • bitCount

      public final ColumnVector bitCount()
      Count the number of set bit for each integer value.
    • bitInvert

      public final ColumnVector bitInvert()
      Invert the bits, output is the same type as input. For BOOL8 type, this is equivalent to logical not (UnaryOp.NOT), but this does not matter since Spark does not support bitwise inverting on boolean type.
    • binaryOp

      public final ColumnVector binaryOp(BinaryOp op, BinaryOperable rhs, DType outType)
      Multiple different binary operations.
      Specified by:
      binaryOp in interface BinaryOperable
      Parameters:
      op - the operation to perform
      rhs - the rhs of the operation
      outType - the type of output you want.
      Returns:
      the result
    • sum

      public Scalar sum()
      Computes the sum of all values in the column, returning a scalar of the same type as this column.
    • sum

      public Scalar sum(DType outType)
      Computes the sum of all values in the column, returning a scalar of the specified type.
    • min

      public Scalar min()
      Returns the minimum of all values in the column, returning a scalar of the same type as this column.
    • min

      @Deprecated public Scalar min(DType outType)
      Deprecated.
      the min reduction no longer internally allows for setting the output type, as a work around this API will cast the input type to the output type for you, but this may not work in all cases.
      Returns the minimum of all values in the column, returning a scalar of the specified type.
    • max

      public Scalar max()
      Returns the maximum of all values in the column, returning a scalar of the same type as this column.
    • max

      @Deprecated public Scalar max(DType outType)
      Deprecated.
      the max reduction no longer internally allows for setting the output type, as a work around this API will cast the input type to the output type for you, but this may not work in all cases.
      Returns the maximum of all values in the column, returning a scalar of the specified type.
    • product

      public Scalar product()
      Returns the product of all values in the column, returning a scalar of the same type as this column.
    • product

      public Scalar product(DType outType)
      Returns the product of all values in the column, returning a scalar of the specified type.
    • sumOfSquares

      public Scalar sumOfSquares()
      Returns the sum of squares of all values in the column, returning a scalar of the same type as this column.
    • sumOfSquares

      public Scalar sumOfSquares(DType outType)
      Returns the sum of squares of all values in the column, returning a scalar of the specified type.
    • mean

      public Scalar mean()
      Returns the arithmetic mean of all values in the column, returning a FLOAT64 scalar unless the column type is FLOAT32 then a FLOAT32 scalar is returned. Null values are skipped.
    • mean

      public Scalar mean(DType outType)
      Returns the arithmetic mean of all values in the column, returning a scalar of the specified type. Null values are skipped.
      Parameters:
      outType - the output type to return. Note that only floating point types are currently supported.
    • variance

      public Scalar variance()
      Returns the variance of all values in the column, returning a FLOAT64 scalar unless the column type is FLOAT32 then a FLOAT32 scalar is returned. Null values are skipped.
    • variance

      public Scalar variance(DType outType)
      Returns the variance of all values in the column, returning a scalar of the specified type. Null values are skipped.
      Parameters:
      outType - the output type to return. Note that only floating point types are currently supported.
    • standardDeviation

      public Scalar standardDeviation()
      Returns the sample standard deviation of all values in the column, returning a FLOAT64 scalar unless the column type is FLOAT32 then a FLOAT32 scalar is returned. Nulls are not counted as an element of the column when calculating the standard deviation.
    • standardDeviation

      public Scalar standardDeviation(DType outType)
      Returns the sample standard deviation of all values in the column, returning a scalar of the specified type. Null's are not counted as an element of the column when calculating the standard deviation.
      Parameters:
      outType - the output type to return. Note that only floating point types are currently supported.
    • any

      public Scalar any()
      Returns a boolean scalar that is true if any of the elements in the column are true or non-zero otherwise false. Null values are skipped.
    • any

      public Scalar any(DType outType)
      Returns a scalar is true or 1, depending on the specified type, if any of the elements in the column are true or non-zero otherwise false or 0. Null values are skipped.
    • all

      public Scalar all()
      Returns a boolean scalar that is true if all of the elements in the column are true or non-zero otherwise false. Null values are skipped.
    • all

      @Deprecated public Scalar all(DType outType)
      Deprecated.
      the only output type supported is BOOL8.
      Returns a scalar is true or 1, depending on the specified type, if all of the elements in the column are true or non-zero otherwise false or 0. Null values are skipped.
    • reduce

      public Scalar reduce(ReductionAggregation aggregation)
      Computes the reduction of the values in all rows of a column. Overflows in reductions are not detected. Specifying a higher precision output type may prevent overflow. Only the MIN and MAX ops are The null values are skipped for the operation.
      Parameters:
      aggregation - The reduction aggregation to perform
      Returns:
      The scalar result of the reduction operation. If the column is empty or the reduction operation fails then the Scalar.isValid() method of the result will return false.
    • reduce

      public Scalar reduce(ReductionAggregation aggregation, DType outType)
      Computes the reduction of the values in all rows of a column. Overflows in reductions are not detected. Specifying a higher precision output type may prevent overflow. Only the MIN and MAX ops are supported for reduction of non-arithmetic types (TIMESTAMP...) The null values are skipped for the operation.
      Parameters:
      aggregation - The reduction aggregation to perform
      outType - The type of scalar value to return. Not all output types are supported by all aggregation operations.
      Returns:
      The scalar result of the reduction operation. If the column is empty or the reduction operation fails then the Scalar.isValid() method of the result will return false.
    • segmentedReduce

      public ColumnVector segmentedReduce(ColumnView offsets, SegmentedReductionAggregation aggregation)
      Do a segmented reduce where the offsets column indicates which groups in this to combine. The output type is the same as the input type.
      Parameters:
      offsets - an INT32 column with no nulls.
      aggregation - the aggregation to do
      Returns:
      the result.
    • segmentedReduce

      public ColumnVector segmentedReduce(ColumnView offsets, SegmentedReductionAggregation aggregation, DType outType)
      Do a segmented reduce where the offsets column indicates which groups in this to combine.
      Parameters:
      offsets - an INT32 column with no nulls.
      aggregation - the aggregation to do
      outType - the output data type.
      Returns:
      the result.
    • segmentedReduce

      public ColumnVector segmentedReduce(ColumnView offsets, SegmentedReductionAggregation aggregation, NullPolicy nullPolicy, DType outType)
      Do a segmented reduce where the offsets column indicates which groups in this to combine.
      Parameters:
      offsets - an INT32 column with no nulls.
      aggregation - the aggregation to do
      nullPolicy - the null policy.
      outType - the output data type.
      Returns:
      the result.
    • segmentedGather

      public ColumnVector segmentedGather(ColumnView gatherMap)
      Segmented gather of the elements within a list element in each row of a list column. For each list, assuming the size is N, valid indices of gather map ranges in [-N, N). Out of bound indices refer to null.
      Parameters:
      gatherMap - ListColumnView carrying lists of integral indices which maps the element in list of each row in the source columns to rows of lists in the result columns.
      Returns:
      the result.
    • segmentedGather

      public ColumnVector segmentedGather(ColumnView gatherMap, OutOfBoundsPolicy policy)
      Segmented gather of the elements within a list element in each row of a list column.
      Parameters:
      gatherMap - ListColumnView carrying lists of integral indices which maps the element in list of each row in the source columns to rows of lists in the result columns.
      policy - OutOfBoundsPolicy, `DONT_CHECK` leads to undefined behaviour; `NULLIFY` replaces out of bounds with null.
      Returns:
      the result.
    • listReduce

      public ColumnVector listReduce(SegmentedReductionAggregation aggregation)
      Do a reduction on the values in a list. The output type will be the type of the data column of this list.
      Parameters:
      aggregation - the aggregation to perform
    • listReduce

      public ColumnVector listReduce(SegmentedReductionAggregation aggregation, DType outType)
      Do a reduction on the values in a list.
      Parameters:
      aggregation - the aggregation to perform
      outType - the type of the output. Typically, this should match with the child type of the list.
    • listReduce

      public ColumnVector listReduce(SegmentedReductionAggregation aggregation, NullPolicy nullPolicy, DType outType)
      Do a reduction on the values in a list.
      Parameters:
      aggregation - the aggregation to perform
      nullPolicy - should nulls be included or excluded from the aggregation.
      outType - the type of the output. Typically, this should match with the child type of the list.
    • approxPercentile

      public final ColumnVector approxPercentile(double[] percentiles)
      Calculate various percentiles of this ColumnVector, which must contain centroids produced by a t-digest aggregation.
      Parameters:
      percentiles - Required percentiles [0,1]
      Returns:
      Column containing the approximate percentile values as a list of doubles, in the same order as the input percentiles
    • approxPercentile

      public final ColumnVector approxPercentile(ColumnVector percentiles)
      Calculate various percentiles of this ColumnVector, which must contain centroids produced by a t-digest aggregation.
      Parameters:
      percentiles - Column containing percentiles [0,1]
      Returns:
      Column containing the approximate percentile values as a list of doubles, in the same order as the input percentiles
    • quantile

      public final ColumnVector quantile(QuantileMethod method, double[] quantiles)
      Calculate various quantiles of this ColumnVector. It is assumed that this is already sorted in the desired order.
      Parameters:
      method - the method used to calculate the quantiles
      quantiles - the quantile values [0,1]
      Returns:
      Column containing the approximate percentile values as a list of doubles, in the same order as the input percentiles
    • rollingWindow

      public final ColumnVector rollingWindow(RollingAggregation op, WindowOptions options)
      This function aggregates values in a window around each element i of the input column. Please refer to WindowsOptions for various options that can be passed. Note: Only rows-based windows are supported.
      Parameters:
      op - the operation to perform.
      options - various window function arguments.
      Returns:
      Column containing aggregate function result.
      Throws:
      IllegalArgumentException - if unsupported window specification * (i.e. other than WindowOptions.FrameType.ROWS is used.
    • prefixSum

      public final ColumnVector prefixSum()
      Compute the prefix sum (aka cumulative sum) of the values in this column. This is just a convenience method for an inclusive scan with a SUM aggregation.
    • scan

      public final ColumnVector scan(ScanAggregation aggregation, ScanType scanType, NullPolicy nullPolicy)
      Computes a scan for a column. This is very similar to a running window on the column.
      Parameters:
      aggregation - the aggregation to perform
      scanType - should the scan be inclusive, include the current row, or exclusive.
      nullPolicy - how should nulls be treated. Note that some aggregations also include a null policy too. Currently none of those aggregations are supported so it is undefined how they would interact with each other.
    • scan

      public final ColumnVector scan(ScanAggregation aggregation, ScanType scanType)
      Computes a scan for a column that excludes nulls.
      Parameters:
      aggregation - the aggregation to perform
      scanType - should the scan be inclusive, include the current row, or exclusive.
    • scan

      public final ColumnVector scan(ScanAggregation aggregation)
      Computes an inclusive scan for a column that excludes nulls.
      Parameters:
      aggregation - the aggregation to perform
    • not

      public final ColumnVector not()
      Returns a vector of the logical `not` of each value in the input column (this)
    • contains

      public boolean contains(Scalar needle)
      Find if the `needle` is present in this col example: Single Column: idx 0 1 2 3 4 col = { 10, 20, 20, 30, 50 } Scalar: value = { 20 } result = true
      Parameters:
      needle -
      Returns:
      true if needle is present else false
    • contains

      public final ColumnVector contains(ColumnView searchSpace)
      Returns a new column of DType.BOOL8 elements having the same size as this column, each row value is true if the corresponding entry in this column is contained in the given searchSpace column and false if it is not. The caller will be responsible for the lifecycle of the new vector. example: col = { 10, 20, 30, 40, 50 } searchSpace = { 20, 40, 60, 80 } result = { false, true, false, true, false }
      Parameters:
      searchSpace -
      Returns:
      A new ColumnVector of type DType.BOOL8
    • toTitle

      public final ColumnVector toTitle()
      Returns a column of strings where, for each string row in the input, the first character after spaces is modified to upper-case, while all the remaining characters in a word are modified to lower-case. Any null string entries return corresponding null output column entries
    • capitalize

      public final ColumnVector capitalize(Scalar delimiters)
      Returns a column of capitalized strings. If the `delimiters` is an empty string, then only the first character of each row is capitalized. Otherwise, a non-delimiter character is capitalized after any delimiter character is found. Example: input = ["tesT1", "a Test", "Another Test", "a\tb"]; delimiters = "" output is ["Test1", "A test", "Another test", "A\tb"] delimiters = " " output is ["Test1", "A Test", "Another Test", "A\tb"] Any null string entries return corresponding null output column entries.
      Parameters:
      delimiters - Used if identifying words to capitalize. Should not be null.
      Returns:
      a column of capitalized strings. Users should close the returned column.
    • joinStrings

      public final ColumnVector joinStrings(Scalar separator, Scalar narep)
      Concatenates all strings in the column into one new string delimited by an optional separator string. This returns a column with one string. Any null entries are ignored unless the narep parameter specifies a replacement string (not a null value).
      Parameters:
      separator - what to insert to separate each row.
      narep - what to replace nulls with
      Returns:
      a ColumnVector with a single string in it.
    • castTo

      public ColumnVector castTo(DType type)
      Generic method to cast ColumnVector When casting from a Date, Timestamp, or Boolean to a numerical type the underlying numerical representation of the data will be used for the cast. For Strings: Casting strings from/to timestamp isn't supported atm. Please look at asTimestamp(DType, String) and asStrings(String) for casting string to timestamp when the format is known Float values when converted to String could be different from the expected default behavior in Java e.g. 12.3 => "12.30000019" instead of "12.3" Double.POSITIVE_INFINITY => "Inf" instead of "INFINITY" Double.NEGATIVE_INFINITY => "-Inf" instead of "-INFINITY"
      Parameters:
      type - type of the resulting ColumnVector
      Returns:
      A new vector allocated on the GPU
    • replaceChildrenWithViews

      public ColumnView replaceChildrenWithViews(int[] indices, ColumnView[] views)
      This method takes in a nested type and replaces its children with the given views Note: Make sure the numbers of rows in the leaf node are the same as the child replacing it otherwise the list can point to elements outside of the column values. Note: this method returns a ColumnView that won't live past the ColumnVector that it's pointing to. Ex: List list = col{{1,3}, {9,3,5}} validNewChild = col{8, 3, 9, 2, 0} list.replaceChildrenWithViews(1, validNewChild) => col{{8, 3}, {9, 2, 0}} invalidNewChild = col{3, 2} list.replaceChildrenWithViews(1, invalidNewChild) => col{{3, 2}, {invalid, invalid, invalid}} invalidNewChild = col{8, 3, 9, 2, 0, 0, 7} list.replaceChildrenWithViews(1, invalidNewChild) => col{{8, 3}, {9, 2, 0}} // undefined result
    • replaceListChild

      public ColumnView replaceListChild(ColumnView child)
      This method takes in a list and returns a new list with the leaf node replaced with the given view. Make sure the numbers of rows in the leaf node are the same as the child replacing it otherwise the list can point to elements outside of the column values. Note: this method returns a ColumnView that won't live past the ColumnVector that it's pointing to. Ex: List list = col{{1,3}, {9,3,5}} validNewChild = col{8, 3, 9, 2, 0} list.replaceChildrenWithViews(1, validNewChild) => col{{8, 3}, {9, 2, 0}} invalidNewChild = col{3, 2} list.replaceChildrenWithViews(1, invalidNewChild) => col{{3, 2}, {invalid, invalid, invalid}} throws an exception invalidNewChild = col{8, 3, 9, 2, 0, 0, 7} list.replaceChildrenWithViews(1, invalidNewChild) => col{{8, 3}, {9, 2, 0}} throws an exception
    • logicalCastTo

      @Deprecated public ColumnView logicalCastTo(DType type)
      Deprecated.
      this has changed to bit_cast in C++ so use that name instead
      Zero-copy cast between types with the same underlying representation. Similar to reinterpret_cast or bit_cast in C++. This will essentially take the underlying data and update the metadata to reflect a new type. Not all types are supported the width of the types must match.
      Parameters:
      type - the type you want to go to.
      Returns:
      a ColumnView that cannot outlive the Column that owns the actual data it points to.
    • bitCastTo

      public ColumnView bitCastTo(DType type)
      Zero-copy cast between types with the same underlying length. Similar to bit_cast in C++. This will take the underlying data and create new metadata so it is interpreted as a new type. Not all types are supported the width of the types must match.
      Parameters:
      type - the type you want to go to.
      Returns:
      a ColumnView that cannot outlive the Column that owns the actual data it points to.
    • asBytes

      public final ColumnVector asBytes()
      Cast to Byte - ColumnVector This method takes the value provided by the ColumnVector and casts to byte When casting from a Date, Timestamp, or Boolean to a byte type the underlying numerical representation of the data will be used for the cast.
      Returns:
      A new vector allocated on the GPU
    • asByteList

      public final ColumnVector asByteList()
      Cast to list of bytes This method converts the rows provided by the ColumnVector and casts each row to a list of bytes with endinanness reversed. Numeric and string types supported, but not timestamps.
      Returns:
      A new vector allocated on the GPU
    • asByteList

      public final ColumnVector asByteList(boolean config)
      Cast to list of bytes This method converts the rows provided by the ColumnVector and casts each row to a list of bytes. Numeric and string types supported, but not timestamps.
      Parameters:
      config - Flips the byte order (endianness) if true, retains byte order otherwise
      Returns:
      A new vector allocated on the GPU
    • asUnsignedBytes

      public final ColumnVector asUnsignedBytes()
      Cast to unsigned Byte - ColumnVector This method takes the value provided by the ColumnVector and casts to byte When casting from a Date, Timestamp, or Boolean to a byte type the underlying numerical representation of the data will be used for the cast.

      Java does not have an unsigned byte type, so properly decoding these values will require extra steps on the part of the application. See Byte.toUnsignedInt(byte).

      Returns:
      A new vector allocated on the GPU
    • asShorts

      public final ColumnVector asShorts()
      Cast to Short - ColumnVector This method takes the value provided by the ColumnVector and casts to short When casting from a Date, Timestamp, or Boolean to a short type the underlying numerical representation of the data will be used for the cast.
      Returns:
      A new vector allocated on the GPU
    • asUnsignedShorts

      public final ColumnVector asUnsignedShorts()
      Cast to unsigned Short - ColumnVector This method takes the value provided by the ColumnVector and casts to short When casting from a Date, Timestamp, or Boolean to a short type the underlying numerical representation of the data will be used for the cast.

      Java does not have an unsigned short type, so properly decoding these values will require extra steps on the part of the application. See Short.toUnsignedInt(short).

      Returns:
      A new vector allocated on the GPU
    • asInts

      public final ColumnVector asInts()
      Cast to Int - ColumnVector This method takes the value provided by the ColumnVector and casts to int When casting from a Date, Timestamp, or Boolean to a int type the underlying numerical representation of the data will be used for the cast.
      Returns:
      A new vector allocated on the GPU
    • asUnsignedInts

      public final ColumnVector asUnsignedInts()
      Cast to unsigned Int - ColumnVector This method takes the value provided by the ColumnVector and casts to int When casting from a Date, Timestamp, or Boolean to a int type the underlying numerical representation of the data will be used for the cast.

      Java does not have an unsigned int type, so properly decoding these values will require extra steps on the part of the application. See Integer.toUnsignedLong(int).

      Returns:
      A new vector allocated on the GPU
    • asLongs

      public final ColumnVector asLongs()
      Cast to Long - ColumnVector This method takes the value provided by the ColumnVector and casts to long When casting from a Date, Timestamp, or Boolean to a long type the underlying numerical representation of the data will be used for the cast.
      Returns:
      A new vector allocated on the GPU
    • asUnsignedLongs

      public final ColumnVector asUnsignedLongs()
      Cast to unsigned Long - ColumnVector This method takes the value provided by the ColumnVector and casts to long When casting from a Date, Timestamp, or Boolean to a long type the underlying numerical representation of the data will be used for the cast.

      Java does not have an unsigned long type, so properly decoding these values will require extra steps on the part of the application. See Long.toUnsignedString(long).

      Returns:
      A new vector allocated on the GPU
    • asFloats

      public final ColumnVector asFloats()
      Cast to Float - ColumnVector This method takes the value provided by the ColumnVector and casts to float When casting from a Date, Timestamp, or Boolean to a float type the underlying numerical representatio of the data will be used for the cast.
      Returns:
      A new vector allocated on the GPU
    • asDoubles

      public final ColumnVector asDoubles()
      Cast to Double - ColumnVector This method takes the value provided by the ColumnVector and casts to double When casting from a Date, Timestamp, or Boolean to a double type the underlying numerical representation of the data will be used for the cast.
      Returns:
      A new vector allocated on the GPU
    • asTimestampDays

      public final ColumnVector asTimestampDays()
      Cast to TIMESTAMP_DAYS - ColumnVector This method takes the value provided by the ColumnVector and casts to TIMESTAMP_DAYS
      Returns:
      A new vector allocated on the GPU
    • asTimestampDays

      public final ColumnVector asTimestampDays(String format)
      Cast to TIMESTAMP_DAYS - ColumnVector This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_DAYS
      Parameters:
      format - timestamp string format specifier, ignored if the column type is not string
      Returns:
      A new vector allocated on the GPU
    • asTimestampSeconds

      public final ColumnVector asTimestampSeconds()
      Cast to TIMESTAMP_SECONDS - ColumnVector This method takes the value provided by the ColumnVector and casts to TIMESTAMP_SECONDS
      Returns:
      A new vector allocated on the GPU
    • asTimestampSeconds

      public final ColumnVector asTimestampSeconds(String format)
      Cast to TIMESTAMP_SECONDS - ColumnVector This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_SECONDS
      Parameters:
      format - timestamp string format specifier, ignored if the column type is not string
      Returns:
      A new vector allocated on the GPU
    • asTimestampMicroseconds

      public final ColumnVector asTimestampMicroseconds()
      Cast to TIMESTAMP_MICROSECONDS - ColumnVector This method takes the value provided by the ColumnVector and casts to TIMESTAMP_MICROSECONDS
      Returns:
      A new vector allocated on the GPU
    • asTimestampMicroseconds

      public final ColumnVector asTimestampMicroseconds(String format)
      Cast to TIMESTAMP_MICROSECONDS - ColumnVector This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_MICROSECONDS
      Parameters:
      format - timestamp string format specifier, ignored if the column type is not string
      Returns:
      A new vector allocated on the GPU
    • asTimestampMilliseconds

      public final ColumnVector asTimestampMilliseconds()
      Cast to TIMESTAMP_MILLISECONDS - ColumnVector This method takes the value provided by the ColumnVector and casts to TIMESTAMP_MILLISECONDS.
      Returns:
      A new vector allocated on the GPU
    • asTimestampMilliseconds

      public final ColumnVector asTimestampMilliseconds(String format)
      Cast to TIMESTAMP_MILLISECONDS - ColumnVector This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_MILLISECONDS.
      Parameters:
      format - timestamp string format specifier, ignored if the column type is not string
      Returns:
      A new vector allocated on the GPU
    • asTimestampNanoseconds

      public final ColumnVector asTimestampNanoseconds()
      Cast to TIMESTAMP_NANOSECONDS - ColumnVector This method takes the value provided by the ColumnVector and casts to TIMESTAMP_NANOSECONDS.
      Returns:
      A new vector allocated on the GPU
    • asTimestampNanoseconds

      public final ColumnVector asTimestampNanoseconds(String format)
      Cast to TIMESTAMP_NANOSECONDS - ColumnVector This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_NANOSECONDS.
      Parameters:
      format - timestamp string format specifier, ignored if the column type is not string
      Returns:
      A new vector allocated on the GPU
    • asTimestamp

      public final ColumnVector asTimestamp(DType timestampType, String format)
      Parse a string to a timestamp. Strings that fail to parse will default to 0, corresponding to 1970-01-01 00:00:00.000.
      Parameters:
      timestampType - timestamp DType that includes the time unit to parse the timestamp into.
      format - strptime format specifier string of the timestamp. Used to parse and convert the timestamp with. Supports %Y,%y,%m,%d,%H,%I,%p,%M,%S,%f,%z format specifiers. See https://github.com/rapidsai/custrings/blob/branch-0.10/docs/source/datetime.md for full parsing format specification and documentation.
      Returns:
      A new ColumnVector containing the long representations of the timestamps in the original column vector.
    • asStrings

      public final ColumnVector asStrings()
      Cast to Strings. Negative timestamp values are not currently supported and will yield undesired results. See github issue https://github.com/rapidsai/cudf/issues/3116 for details In case of timestamps it follows the following formats DType.TIMESTAMP_DAYS - "%Y-%m-%d" DType.TIMESTAMP_SECONDS - "%Y-%m-%d %H:%M:%S" DType.TIMESTAMP_MICROSECONDS - "%Y-%m-%d %H:%M:%S.%f" DType.TIMESTAMP_MILLISECONDS - "%Y-%m-%d %H:%M:%S.%f" DType.TIMESTAMP_NANOSECONDS - "%Y-%m-%d %H:%M:%S.%f"
      Returns:
      A new vector allocated on the GPU.
    • asStrings

      public final ColumnVector asStrings(String format)
      Method to parse and convert a timestamp column vector to string column vector. A unix timestamp is a long value representing how many units since 1970-01-01 00:00:00:000 in either positive or negative direction. No checking is done for invalid formats or invalid timestamp units. Negative timestamp values are not currently supported and will yield undesired results. See github issue https://github.com/rapidsai/cudf/issues/3116 for details
      Parameters:
      format - - strftime format specifier string of the timestamp. Its used to parse and convert the timestamp with. Supports %m,%j,%d,%H,%M,%S,%y,%Y,%f format specifiers. %d Day of the month: 01-31 %m Month of the year: 01-12 %y Year without century: 00-99c %Y Year with century: 0001-9999 %H 24-hour of the day: 00-23 %M Minute of the hour: 00-59 %S Second of the minute: 00-59 %f 6-digit microsecond: 000000-999999 See https://github.com/rapidsai/custrings/blob/branch-0.10/docs/source/datetime.md Reported bugs https://github.com/rapidsai/cudf/issues/4160 after the bug is fixed this method should also support %I 12-hour of the day: 01-12 %p Only 'AM', 'PM' %j day of the year
      Returns:
      A new vector allocated on the GPU
    • isTimestamp

      public final ColumnVector isTimestamp(String format)
      Verifies that a string column can be parsed to timestamps using the provided format pattern. The format pattern can include the following specifiers: "%Y,%y,%m,%d,%H,%I,%p,%M,%S,%f,%z" | Specifier | Description | | :-------: | ----------- | | \%d | Day of the month: 01-31 | | \%m | Month of the year: 01-12 | | \%y | Year without century: 00-99 | | \%Y | Year with century: 0001-9999 | | \%H | 24-hour of the day: 00-23 | | \%I | 12-hour of the day: 01-12 | | \%M | Minute of the hour: 00-59| | \%S | Second of the minute: 00-59 | | \%f | 6-digit microsecond: 000000-999999 | | \%z | UTC offset with format ±HHMM Example +0500 | | \%j | Day of the year: 001-366 | | \%p | Only 'AM', 'PM' or 'am', 'pm' are recognized | Other specifiers are not currently supported. The "%f" supports a precision value to read the numeric digits. Specify the precision with a single integer value (1-9) as follows: use "%3f" for milliseconds, "%6f" for microseconds and "%9f" for nanoseconds. Any null string entry will result in a corresponding null row in the output column. This will return a column of type boolean where a `true` row indicates the corresponding input string can be parsed correctly with the given format.
      Parameters:
      format - String specifying the timestamp format in strings.
      Returns:
      New boolean ColumnVector.
    • extractListElement

      public final ColumnVector extractListElement(int index)
      For each list in this column pull out the entry at the given index. If the entry would go off the end of the list a NULL is returned instead.
      Parameters:
      index - 0 based offset into the list. Negative values go backwards from the end of the list.
      Returns:
      a new column of the values at those indexes.
    • extractListElement

      public final ColumnVector extractListElement(ColumnView indices)
      For each list in this column pull out the entry at the corresponding index specified in the index column. If the entry goes off the end of the list a NULL is returned instead. The index column should have the same row count with the list column.
      Parameters:
      indices - a column of 0 based offsets into the list. Negative values go backwards from the end of the list.
      Returns:
      a new column of the values at those indexes.
    • dropListDuplicates

      public final ColumnVector dropListDuplicates()
      Create a new LIST column by copying elements from the current LIST column ignoring duplicate, producing a LIST column in which each list contain only unique elements. Relative ordering elements will be kept the same, by default can keep any of the duplicates Example: [0,3,4,0] may produce either [0,3,4] or [3,4,0], both of which are valid here
      Returns:
      A new LIST column having unique list elements.
    • dropListDuplicates

      public final ColumnVector dropListDuplicates(DuplicateKeepOption keepOption)
      Create a new LIST column by copying elements from the current LIST column ignoring duplicate, producing a LIST column in which each list contain only unique elements. Order of the output elements within each list will be preserved as in the input
      Parameters:
      keep_option - Flag to specify which element to keep (first, last, any)
      Returns:
      A new LIST column having unique list elements.
    • dropListDuplicatesWithKeysValues

      public final ColumnVector dropListDuplicatesWithKeysValues()
      Given a LIST column in which each element is a struct containing a <key, value> pair. An output LIST column is generated by copying elements of the current column in a way such that if a list contains multiple elements having the same key then only the last element will be copied.
      Returns:
      A new LIST column having list elements with unique keys.
    • flattenLists

      public ColumnVector flattenLists()
      Flatten each list of lists into a single list. The column must have rows that are lists of lists. Any row containing null list elements will result in a null output row.
      Returns:
      A new column vector containing the flattened result
    • flattenLists

      public ColumnVector flattenLists(boolean ignoreNull)
      Flatten each list of lists into a single list. The column must have rows that are lists of lists.
      Parameters:
      ignoreNull - Whether to ignore null list elements in the input column from the operation, or any row containing null list elements will result in a null output row
      Returns:
      A new column vector containing the flattened result
    • reverseStringsOrLists

      public final ColumnVector reverseStringsOrLists()
      Copy the current column to a new column, each string or list of the output column will have reverse order of characters or elements.
      Returns:
      A new column with lists or strings having reverse order.
    • upper

      public final ColumnVector upper()
      Convert a string to upper case.
    • lower

      public final ColumnVector lower()
      Convert a string to lower case.
    • stringLocate

      public final ColumnVector stringLocate(Scalar substring)
      Locates the starting index of the first instance of the given string in each row of a column. 0 indexing, returns -1 if the substring is not found. Overloading stringLocate to support default values for start (0) and end index.
      Parameters:
      substring - scalar containing the string to locate within each row.
    • stringLocate

      public final ColumnVector stringLocate(Scalar substring, int start)
      Locates the starting index of the first instance of the given string in each row of a column. 0 indexing, returns -1 if the substring is not found. Overloading stringLocate to support default value for end index (-1, the end of each string).
      Parameters:
      substring - scalar containing the string to locate within each row.
      start - character index to start the search from (inclusive).
    • stringLocate

      public final ColumnVector stringLocate(Scalar substring, int start, int end)
      Locates the starting index of the first instance of the given string in each row of a column. 0 indexing, returns -1 if the substring is not found. Can be be configured to start or end the search mid string.
      Parameters:
      substring - scalar containing the string scalar to locate within each row.
      start - character index to start the search from (inclusive).
      end - character index to end the search on (exclusive).
    • stringSplit

      @Deprecated public final Table stringSplit(String pattern, int limit, boolean splitByRegex)
      Deprecated.
      Returns a list of columns by splitting each string using the specified pattern. The number of rows in the output columns will be the same as the input column. Null entries are added for a row where split results have been exhausted. Null input entries result in all nulls in the corresponding rows of the output columns.
      Parameters:
      pattern - UTF-8 encoded string identifying the split pattern for each input string.
      limit - the maximum size of the list resulting from splitting each input string, or -1 for all possible splits. Note that limit = 0 (all possible splits without trailing empty strings) and limit = 1 (no split at all) are not supported.
      splitByRegex - a boolean flag indicating whether the input strings will be split by a regular expression pattern or just by a string literal delimiter.
      Returns:
      list of strings columns as a table.
    • stringSplit

      public final Table stringSplit(RegexProgram regexProg, int limit)
      Returns a list of columns by splitting each string using the specified regex program pattern. The number of rows in the output columns will be the same as the input column. Null entries are added for the rows where split results have been exhausted. Null input entries result in all nulls in the corresponding rows of the output columns.
      Parameters:
      regexProg - the regex program with UTF-8 encoded string identifying the split pattern for each input string.
      limit - the maximum size of the list resulting from splitting each input string, or -1 for all possible splits. Note that limit = 0 (all possible splits without trailing empty strings) and limit = 1 (no split at all) are not supported.
      Returns:
      list of strings columns as a table.
    • stringSplit

      @Deprecated public final Table stringSplit(String pattern, boolean splitByRegex)
      Deprecated.
      Returns a list of columns by splitting each string using the specified pattern. The number of rows in the output columns will be the same as the input column. Null entries are added for a row where split results have been exhausted. Null input entries result in all nulls in the corresponding rows of the output columns.
      Parameters:
      pattern - UTF-8 encoded string identifying the split pattern for each input string.
      splitByRegex - a boolean flag indicating whether the input strings will be split by a regular expression pattern or just by a string literal delimiter.
      Returns:
      list of strings columns as a table.
    • stringSplit

      public final Table stringSplit(String delimiter, int limit)
      Returns a list of columns by splitting each string using the specified string literal delimiter. The number of rows in the output columns will be the same as the input column. Null entries are added for a row where split results have been exhausted. Null input entries result in all nulls in the corresponding rows of the output columns.
      Parameters:
      delimiter - UTF-8 encoded string identifying the split delimiter for each input string.
      limit - the maximum size of the list resulting from splitting each input string, or -1 for all possible splits. Note that limit = 0 (all possible splits without trailing empty strings) and limit = 1 (no split at all) are not supported.
      Returns:
      list of strings columns as a table.
    • stringSplit

      public final Table stringSplit(String delimiter)
      Returns a list of columns by splitting each string using the specified string literal delimiter. The number of rows in the output columns will be the same as the input column. Null entries are added for a row where split results have been exhausted. Null input entries result in all nulls in the corresponding rows of the output columns.
      Parameters:
      delimiter - UTF-8 encoded string identifying the split delimiter for each input string.
      Returns:
      list of strings columns as a table.
    • stringSplit

      public final Table stringSplit(RegexProgram regexProg)
      Returns a list of columns by splitting each string using the specified regex program pattern. The number of rows in the output columns will be the same as the input column. Null entries are added for the rows where split results have been exhausted. Null input entries result in all nulls in the corresponding rows of the output columns.
      Parameters:
      regexProg - the regex program with UTF-8 encoded string identifying the split pattern for each input string.
      Returns:
      list of strings columns as a table.
    • stringSplitRecord

      @Deprecated public final ColumnVector stringSplitRecord(String pattern, int limit, boolean splitByRegex)
      Deprecated.
      Returns a column that are lists of strings in which each list is made by splitting the corresponding input string using the specified pattern.
      Parameters:
      pattern - UTF-8 encoded string identifying the split pattern for each input string.
      limit - the maximum size of the list resulting from splitting each input string, or -1 for all possible splits. Note that limit = 0 (all possible splits without trailing empty strings) and limit = 1 (no split at all) are not supported.
      splitByRegex - a boolean flag indicating whether the input strings will be split by a regular expression pattern or just by a string literal delimiter.
      Returns:
      a LIST column of string elements.
    • stringSplitRecord

      public final ColumnVector stringSplitRecord(RegexProgram regexProg, int limit)
      Returns a column that are lists of strings in which each list is made by splitting the corresponding input string using the specified regex program pattern.
      Parameters:
      regexProg - the regex program with UTF-8 encoded string identifying the split pattern for each input string.
      limit - the maximum size of the list resulting from splitting each input string, or -1 for all possible splits. Note that limit = 0 (all possible splits without trailing empty strings) and limit = 1 (no split at all) are not supported.
      Returns:
      a LIST column of string elements.
    • stringSplitRecord

      @Deprecated public final ColumnVector stringSplitRecord(String pattern, boolean splitByRegex)
      Deprecated.
      Returns a column that are lists of strings in which each list is made by splitting the corresponding input string using the specified pattern.
      Parameters:
      pattern - UTF-8 encoded string identifying the split pattern for each input string.
      splitByRegex - a boolean flag indicating whether the input strings will be split by a regular expression pattern or just by a string literal delimiter.
      Returns:
      a LIST column of string elements.
    • stringSplitRecord

      public final ColumnVector stringSplitRecord(String delimiter, int limit)
      Returns a column that are lists of strings in which each list is made by splitting the corresponding input string using the specified string literal delimiter.
      Parameters:
      delimiter - UTF-8 encoded string identifying the split delimiter for each input string.
      limit - the maximum size of the list resulting from splitting each input string, or -1 for all possible splits. Note that limit = 0 (all possible splits without trailing empty strings) and limit = 1 (no split at all) are not supported.
      Returns:
      a LIST column of string elements.
    • stringSplitRecord

      public final ColumnVector stringSplitRecord(String delimiter)
      Returns a column that are lists of strings in which each list is made by splitting the corresponding input string using the specified string literal delimiter.
      Parameters:
      delimiter - UTF-8 encoded string identifying the split delimiter for each input string.
      Returns:
      a LIST column of string elements.
    • stringSplitRecord

      public final ColumnVector stringSplitRecord(RegexProgram regexProg)
      Returns a column that are lists of strings in which each list is made by splitting the corresponding input string using the specified regex program pattern.
      Parameters:
      regexProg - the regex program with UTF-8 encoded string identifying the split pattern for each input string.
      Returns:
      a LIST column of string elements.
    • substring

      public final ColumnVector substring(int start)
      Returns a new strings column that contains substrings of the strings in the provided column. The character positions to retrieve in each string are `[start, )`..
      Parameters:
      start - first character index to begin the substring(inclusive).
    • substring

      public final ColumnVector substring(int start, int end)
      Returns a new strings column that contains substrings of the strings in the provided column. 0-based indexing, If the stop position is past end of a string's length, then end of string is used as stop position for that string.
      Parameters:
      start - first character index to begin the substring(inclusive).
      end - last character index to stop the substring(exclusive)
      Returns:
      A new java column vector containing the substrings.
    • substring

      public final ColumnVector substring(ColumnView start, ColumnView end)
      Returns a new strings column that contains substrings of the strings in the provided column which uses unique ranges for each string
      Parameters:
      start - Vector containing start indices of each string
      end - Vector containing end indices of each string. -1 indicated to read until end of string.
      Returns:
      A new java column vector containing the substrings/
    • stringConcatenateListElements

      public final ColumnVector stringConcatenateListElements(ColumnView sepCol)
      Given a lists column of strings (each row is a list of strings), concatenates the strings within each row and returns a single strings column result. Each new string is created by concatenating the strings from the same row (same list element) delimited by the separator provided. This version of the function relaces nulls with empty string and returns null for empty list.
      Parameters:
      sepCol - strings column that provides separators for concatenation.
      Returns:
      A new java column vector containing the concatenated strings with separator between.
    • stringConcatenateListElements

      public final ColumnVector stringConcatenateListElements(ColumnView sepCol, Scalar separatorNarep, Scalar stringNarep, boolean separateNulls, boolean emptyStringOutputIfEmptyList)
      Given a lists column of strings (each row is a list of strings), concatenates the strings within each row and returns a single strings column result. Each new string is created by concatenating the strings from the same row (same list element) delimited by the row separator provided in the sepCol strings column.
      Parameters:
      sepCol - strings column that provides separators for concatenation.
      separatorNarep - string scalar indicating null behavior when a separator is null. If set to null and the separator is null the resulting string will be null. If not null, this string will be used in place of a null separator.
      stringNarep - string that should be used to replace null strings in any non-null list row. If set to null and the string is null the resulting string will be null. If not null, this string will be used in place of a null value.
      separateNulls - if true, then the separator is included for null rows if `stringNarep` is valid.
      emptyStringOutputIfEmptyList - if set to true, any input row that is an empty list will result in an empty string. Otherwise, it will result in a null.
      Returns:
      A new java column vector containing the concatenated strings with separator between.
    • stringConcatenateListElements

      public final ColumnVector stringConcatenateListElements(Scalar separator, Scalar narep, boolean separateNulls, boolean emptyStringOutputIfEmptyList)
      Given a lists column of strings (each row is a list of strings), concatenates the strings within each row and returns a single strings column result. Each new string is created by concatenating the strings from the same row (same list element) delimited by the separator provided.
      Parameters:
      separator - string scalar inserted between each string being merged.
      narep - string scalar indicating null behavior. If set to null and any string in the row is null the resulting string will be null. If not null, null values in any column will be replaced by the specified string. The underlying value in the string scalar may be null, but the object passed in may not.
      separateNulls - if true, then the separator is included for null rows if `narep` is valid.
      emptyStringOutputIfEmptyList - if set to true, any input row that is an empty list will result in an empty string. Otherwise, it will result in a null.
      Returns:
      A new java column vector containing the concatenated strings with separator between.
    • repeatStrings

      public final ColumnVector repeatStrings(int repeatTimes)
      Given a strings column, each string in it is repeated a number of times specified by the repeatTimes parameter. In special cases: - If repeatTimes is not a positive number, a non-null input string will always result in an empty output string. - A null input string will always result in a null output string regardless of the value of the repeatTimes parameter.
      Parameters:
      repeatTimes - The number of times each input string is repeated.
      Returns:
      A new java column vector containing repeated strings.
    • repeatStrings

      public final ColumnVector repeatStrings(ColumnView repeatTimes)
      Given a strings column, an output strings column is generated by repeating each of the input string by a number of times given by the corresponding row in a repeatTimes numeric column. In special cases: - Any null row (from either the input strings column or the repeatTimes column) will always result in a null output string. - If any value in the repeatTimes column is not a positive number and its corresponding input string is not null, the output string will be an empty string.
      Parameters:
      repeatTimes - The column containing numbers of times each input string is repeated.
      Returns:
      A new java column vector containing repeated strings.
    • getJSONObject

      public final ColumnVector getJSONObject(Scalar path, GetJsonObjectOptions options)
      Apply a JSONPath string to all rows in an input strings column. Applies a JSONPath string to an incoming strings column where each row in the column is a valid json string. The output is returned by row as a strings column. For reference, https://tools.ietf.org/id/draft-goessner-dispatch-jsonpath-00.html Note: Only implements the operators: $ . [] *
      Parameters:
      path - The JSONPath string to be applied to each row
      path - The GetJsonObjectOptions to control get_json_object behaviour
      Returns:
      new strings ColumnVector containing the retrieved json object strings
    • getJSONObject

      public final ColumnVector getJSONObject(Scalar path)
      Apply a JSONPath string to all rows in an input strings column. Applies a JSONPath string to an incoming strings column where each row in the column is a valid json string. The output is returned by row as a strings column. For reference, https://tools.ietf.org/id/draft-goessner-dispatch-jsonpath-00.html Note: Only implements the operators: $ . [] *
      Parameters:
      path - The JSONPath string to be applied to each row
      Returns:
      new strings ColumnVector containing the retrieved json object strings
    • stringReplace

      public final ColumnVector stringReplace(Scalar target, Scalar replace)
      Returns a new strings column where target string within each string is replaced with the specified replacement string. The replacement proceeds from the beginning of the string to the end, for example, replacing "aa" with "b" in the string "aaa" will result in "ba" rather than "ab". Specifying an empty string for replace will essentially remove the target string if found in each string. Null string entries will return null output string entries. target Scalar should be string and should not be empty or null.
      Parameters:
      target - String to search for within each string.
      replace - Replacement string if target is found.
      Returns:
      A new java column vector containing replaced strings
    • stringReplace

      public final ColumnVector stringReplace(ColumnView targets, ColumnView repls)
      Returns a new strings column where target strings with each string are replaced with corresponding replacement strings. For each string in the column, the list of targets is searched within that string. If a target string is found, it is replaced by the corresponding entry in the repls column. All occurrences found in each string are replaced. The repls argument can optionally contain a single string. In this case, all matching target substrings will be replaced by that single string. Example: cv = ["hello", "goodbye"] targets = ["e","o"] repls = ["EE","OO"] r1 = cv.stringReplace(targets, repls) r1 is now ["hEEllO", "gOOOOdbyEE"] targets = ["e", "o"] repls = ["_"] r2 = cv.stringReplace(targets, repls) r2 is now ["h_ll_", "g__dby_"]
      Parameters:
      targets - Strings to search for in each string.
      repls - Corresponding replacement strings for target strings.
      Returns:
      A new java column vector containing the replaced strings.
    • replaceRegex

      @Deprecated public final ColumnVector replaceRegex(String pattern, Scalar repl)
      Deprecated.
      For each string, replaces any character sequence matching the given pattern using the replacement string scalar.
      Parameters:
      pattern - The regular expression pattern to search within each string.
      repl - The string scalar to replace for each pattern match.
      Returns:
      A new column vector containing the string results.
    • replaceRegex

      public final ColumnVector replaceRegex(RegexProgram regexProg, Scalar repl)
      For each string, replaces any character sequence matching the given regex program pattern using the replacement string scalar.
      Parameters:
      regexProg - The regex program with pattern to search within each string.
      repl - The string scalar to replace for each pattern match.
      Returns:
      A new column vector containing the string results.
    • replaceRegex

      @Deprecated public final ColumnVector replaceRegex(String pattern, Scalar repl, int maxRepl)
      Deprecated.
      For each string, replaces any character sequence matching the given pattern using the replacement string scalar.
      Parameters:
      pattern - The regular expression pattern to search within each string.
      repl - The string scalar to replace for each pattern match.
      maxRepl - The maximum number of times a replacement should occur within each string.
      Returns:
      A new column vector containing the string results.
    • replaceRegex

      public final ColumnVector replaceRegex(RegexProgram regexProg, Scalar repl, int maxRepl)
      For each string, replaces any character sequence matching the given regex program pattern using the replacement string scalar.
      Parameters:
      regexProg - The regex program with pattern to search within each string.
      repl - The string scalar to replace for each pattern match.
      maxRepl - The maximum number of times a replacement should occur within each string.
      Returns:
      A new column vector containing the string results.
    • replaceMultiRegex

      public final ColumnVector replaceMultiRegex(String[] patterns, ColumnView repls)
      For each string, replaces any character sequence matching any of the regular expression patterns with the corresponding replacement strings.
      Parameters:
      patterns - The regular expression patterns to search within each string.
      repls - The string scalars to replace for each corresponding pattern match.
      Returns:
      A new column vector containing the string results.
    • stringReplaceWithBackrefs

      @Deprecated public final ColumnVector stringReplaceWithBackrefs(String pattern, String replace)
      Deprecated.
      For each string, replaces any character sequence matching the given pattern using the replace template for back-references. Any null string entries return corresponding null output column entries.
      Parameters:
      pattern - The regular expression patterns to search within each string.
      replace - The replacement template for creating the output string.
      Returns:
      A new java column vector containing the string results.
    • stringReplaceWithBackrefs

      public final ColumnVector stringReplaceWithBackrefs(RegexProgram regexProg, String replace)
      For each string, replaces any character sequence matching the given regex program pattern using the replace template for back-references. Any null string entries return corresponding null output column entries.
      Parameters:
      regexProg - The regex program with pattern to search within each string.
      replace - The replacement template for creating the output string.
      Returns:
      A new java column vector containing the string results.
    • zfill

      public final ColumnVector zfill(int width)
      Add '0' as padding to the left of each string. If the string is already width or more characters, no padding is performed. No strings are truncated. Null string entries result in null entries in the output column.
      Parameters:
      width - The minimum number of characters for each string.
      Returns:
      New column of strings.
    • pad

      public final ColumnVector pad(int width)
      Pad the Strings column until it reaches the desired length with spaces " " on the right. If the string is already width or more characters, no padding is performed. No strings are truncated. Null string entries result in null entries in the output column.
      Parameters:
      width - the minimum number of characters for each string.
      Returns:
      the new strings column.
    • pad

      public final ColumnVector pad(int width, PadSide side)
      Pad the Strings column until it reaches the desired length with spaces " ". If the string is already width or more characters, no padding is performed. No strings are truncated. Null string entries result in null entries in the output column.
      Parameters:
      width - the minimum number of characters for each string.
      side - where to add new characters.
      Returns:
      the new strings column.
    • pad

      public final ColumnVector pad(int width, PadSide side, String fillChar)
      Pad the Strings column until it reaches the desired length. If the string is already width or more characters, no padding is performed. No strings are truncated. Null string entries result in null entries in the output column.
      Parameters:
      width - the minimum number of characters for each string.
      side - where to add new characters.
      fillChar - a single character string that holds what should be added.
      Returns:
      the new strings column.
    • startsWith

      public final ColumnVector startsWith(Scalar pattern)
      Checks if each string in a column starts with a specified comparison string, resulting in a parallel column of the boolean results.
      Parameters:
      pattern - scalar containing the string being searched for at the beginning of the column's strings.
      Returns:
      A new java column vector containing the boolean results.
    • endsWith

      public final ColumnVector endsWith(Scalar pattern)
      Checks if each string in a column ends with a specified comparison string, resulting in a parallel column of the boolean results.
      Parameters:
      pattern - scalar containing the string being searched for at the end of the column's strings.
      Returns:
      A new java column vector containing the boolean results.
    • strip

      public final ColumnVector strip()
      Removes whitespace from the beginning and end of a string.
      Returns:
      A new java column vector containing the stripped strings.
    • strip

      public final ColumnVector strip(Scalar toStrip)
      Removes the specified characters from the beginning and end of each string.
      Parameters:
      toStrip - UTF-8 encoded characters to strip from each string.
      Returns:
      A new java column vector containing the stripped strings.
    • lstrip

      public final ColumnVector lstrip()
      Removes whitespace from the beginning of a string.
      Returns:
      A new java column vector containing the stripped strings.
    • lstrip

      public final ColumnVector lstrip(Scalar toStrip)
      Removes the specified characters from the beginning of each string.
      Parameters:
      toStrip - UTF-8 encoded characters to strip from each string.
      Returns:
      A new java column vector containing the stripped strings.
    • rstrip

      public final ColumnVector rstrip()
      Removes whitespace from the end of a string.
      Returns:
      A new java column vector containing the stripped strings.
    • rstrip

      public final ColumnVector rstrip(Scalar toStrip)
      Removes the specified characters from the end of each string.
      Parameters:
      toStrip - UTF-8 encoded characters to strip from each string.
      Returns:
      A new java column vector containing the stripped strings.
    • stringContains

      public final ColumnVector stringContains(Scalar compString)
      Checks if each string in a column contains a specified comparison string, resulting in a parallel column of the boolean results.
      Parameters:
      compString - scalar containing the string being searched for.
      Returns:
      A new java column vector containing the boolean results.
    • stringContains

      public final ColumnVector[] stringContains(ColumnView targets)
      Parameters:
      targets - UTF-8 encoded strings to search for in each string in `input`
      Returns:
      BOOL8 columns
    • clamp

      public final ColumnVector clamp(Scalar lo, Scalar hi)
      Replaces values less than `lo` in `input` with `lo`, and values greater than `hi` with `hi`. if `lo` is invalid, then lo will not be considered while evaluating the input (Essentially considered minimum value of that type). if `hi` is invalid, then hi will not be considered while evaluating the input (Essentially considered maximum value of that type). ``` Example: input: {1, 2, 3, NULL, 5, 6, 7} valid lo and hi lo: 3, hi: 5, lo_replace : 0, hi_replace : 16 output:{0, 0, 3, NULL, 5, 16, 16} invalid lo lo: NULL, hi: 5, lo_replace : 0, hi_replace : 16 output:{1, 2, 3, NULL, 5, 16, 16} invalid hi lo: 3, hi: NULL, lo_replace : 0, hi_replace : 16 output:{0, 0, 3, NULL, 5, 6, 7} ```
      Parameters:
      lo - - Minimum clamp value. All elements less than `lo` will be replaced by `lo`. Ignored if null.
      hi - - Maximum clamp value. All elements greater than `hi` will be replaced by `hi`. Ignored if null.
      Returns:
      Returns a new clamped column as per `lo` and `hi` boundaries
    • clamp

      public final ColumnVector clamp(Scalar lo, Scalar loReplace, Scalar hi, Scalar hiReplace)
      Replaces values less than `lo` in `input` with `lo_replace`, and values greater than `hi` with `hi_replace`. if `lo` is invalid, then lo will not be considered while evaluating the input (Essentially considered minimum value of that type). if `hi` is invalid, then hi will not be considered while evaluating the input (Essentially considered maximum value of that type).
      Parameters:
      lo - - Minimum clamp value. All elements less than `lo` will be replaced by `loReplace`. Ignored if null.
      loReplace - - All elements less than `lo` will be replaced by `loReplace`.
      hi - - Maximum clamp value. All elements greater than `hi` will be replaced by `hiReplace`. Ignored if null.
      hiReplace - - All elements greater than `hi` will be replaced by `hiReplace`.
      Returns:
      - a new clamped column as per `lo` and `hi` boundaries
    • matchesRe

      @Deprecated public final ColumnVector matchesRe(String pattern)
      Deprecated.
      Returns a boolean ColumnVector identifying rows which match the given regex pattern but only at the beginning of the string. ``` cv = ["abc", "123", "def456"] result = cv.matchesRe("\\d+") r is now [false, true, false] ``` Any null string entries return corresponding null output column entries. For supported regex patterns refer to:
    • matchesRe

      public final ColumnVector matchesRe(RegexProgram regexProg)
      Returns a boolean ColumnVector identifying rows which match the given regex program pattern but only at the beginning of the string. ``` cv = ["abc", "123", "def456"] p = new RegexProgram("\\d+", CaptureGroups.NON_CAPTURE) r = cv.matchesRe(p) r is now [false, true, false] ``` Any null string entries return corresponding null output column entries. For supported regex patterns refer to:
    • containsRe

      @Deprecated public final ColumnVector containsRe(String pattern)
      Deprecated.
      Returns a boolean ColumnVector identifying rows which match the given regex pattern starting at any location. ``` cv = ["abc", "123", "def456"] r = cv.containsRe("\\d+") r is now [false, true, true] ``` Any null string entries return corresponding null output column entries. For supported regex patterns refer to:
    • containsRe

      public final ColumnVector containsRe(RegexProgram regexProg)
      Returns a boolean ColumnVector identifying rows which match the given RegexProgram pattern starting at any location. ``` cv = ["abc", "123", "def456"] p = new RegexProgram("\\d+", CaptureGroups.NON_CAPTURE) r = cv.containsRe(p) r is now [false, true, true] ``` Any null string entries return corresponding null output column entries. For supported regex patterns refer to:
    • extractRe

      @Deprecated public final Table extractRe(String pattern) throws CudfException
      Deprecated.
      For each captured group specified in the given regular expression return a column in the table. Null entries are added if the string does not match. Any null inputs also result in null output entries. For supported regex patterns refer to:
      Throws:
      CudfException
    • extractRe

      public final Table extractRe(RegexProgram regexProg) throws CudfException
      For each captured group specified in the given regex program return a column in the table. Null entries are added if the string does not match. Any null inputs also result in null output entries. For supported regex patterns refer to:
      Throws:
      CudfException
    • extractAllRecord

      @Deprecated public final ColumnVector extractAllRecord(String pattern, int idx)
      Deprecated.
      Extracts all strings that match the given regular expression and corresponds to the regular expression group index. Any null inputs also result in null output entries. For supported regex patterns refer to:
    • extractAllRecord

      public final ColumnVector extractAllRecord(RegexProgram regexProg, int idx)
      Extracts all strings that match the given regex program pattern and corresponds to the regular expression group index. Any null inputs also result in null output entries. For supported regex patterns refer to:
    • like

      public final ColumnVector like(Scalar pattern, Scalar escapeChar)
      Returns a boolean ColumnVector identifying rows which match the given like pattern. The like pattern expects only 2 wildcard special characters - `%` any number of any character (including no characters) - `_` any single character ``` cv = ["azaa", "ababaabba", "aaxa"] r = cv.like("%a_aa%", "\\") r is now [true, true, false] r = cv.like("a__a", "\\") r is now [true, false, true] ``` The escape character is specified to include either `%` or `_` in the search, which is expected to be either 0 or 1 character. If more than one character is specified, only the first character is used. ``` cv = ["abc_def", "abc1def", "abc_"] r = cv.like("abc/_d%", "/") r is now [true, false, false] ``` Any null string entries return corresponding null output column entries.
      Parameters:
      pattern - Like pattern to match to each string.
      escapeChar - Character specifies the escape prefix; default is "\\".
      Returns:
      New ColumnVector of boolean results for each string.
    • urlDecode

      public final ColumnVector urlDecode() throws CudfException
      Converts all character sequences starting with '%' into character code-points interpreting the 2 following characters as hex values to create the code-point. For example, the sequence '%20' is converted into byte (0x20) which is a single space character. Another example converts '%C3%A9' into 2 sequential bytes (0xc3 and 0xa9 respectively) which is the é character. Overall, 3 characters are converted into one char byte whenever a '%%' (single percent) character is encountered in the string.

      Any null entries will result in corresponding null entries in the output column.

      Returns:
      a new column instance containing the decoded strings
      Throws:
      CudfException
    • urlEncode

      public final ColumnVector urlEncode() throws CudfException
      Converts mostly non-ascii characters and control characters into UTF-8 hex code-points prefixed with '%'. For example, the space character must be converted to characters '%20' where the '20' indicates the hex value for space in UTF-8. Likewise, multi-byte characters are converted to multiple hex characters. For example, the é character is converted to characters '%C3%A9' where 'C3A9' is the UTF-8 bytes 0xC3A9 for this character.

      Any null entries will result in corresponding null entries in the output column.

      Returns:
      a new column instance containing the encoded strings
      Throws:
      CudfException
    • getMapValue

      public final ColumnVector getMapValue(ColumnView keys)
      Given a column of type List<Struct<X, Y>> and a key column of type X, return a column of type Y, where each row in the output column is the Y value corresponding to the X key. If the key is not found, the corresponding output value is null.
      Parameters:
      keys - the column view with keys to lookup in the column
      Returns:
      a column of values or nulls based on the lookup result
    • getMapValue

      public final ColumnVector getMapValue(Scalar key)
      Given a column of type List<Struct<X, Y>> and a key of type X, return a column of type Y, where each row in the output column is the Y value corresponding to the X key. If the key is not found, the corresponding output value is null.
      Parameters:
      key - the scalar key to lookup in the column
      Returns:
      a column of values or nulls based on the lookup result
    • getMapKeyExistence

      public final ColumnVector getMapKeyExistence(Scalar key)
      For a column of type List<Struct<String, String>> and a passed in String key, return a boolean column for all keys in the structs, It is true if the key exists in the corresponding map for that row, false otherwise. It will never return null for a row.
      Parameters:
      key - the String scalar to lookup in the column
      Returns:
      a boolean column based on the lookup result
    • getMapKeyExistence

      public final ColumnVector getMapKeyExistence(ColumnView keys)
      For a column of type List<Struct<_, _>> and a passed in key column, return a boolean column for all keys in the map. Each output row is true if the key exists in the corresponding map for that row, false otherwise. It will never return null for a row.
      Parameters:
      keys - the keys to lookup in the column
      Returns:
      a boolean column based on the lookup result
    • makeStructView

      public static ColumnView makeStructView(long rows, ColumnView... columns)
      Create a new struct column view of existing column views. Note that this will NOT copy the contents of the input columns to make a new vector, but makes a view that must not outlive the child views that it references. The resulting column cannot be null.
      Parameters:
      rows - the number of rows in the struct column. This is needed if no columns are provided.
      columns - the columns to add to the struct in the order they should be added
      Returns:
      the new column view. It is the responsibility of the caller to close this.
    • makeStructView

      public static ColumnView makeStructView(ColumnView... columns)
      Create a new struct column view of existing column views. Note that this will NOT copy the contents of the input columns to make a new vector, but makes a view that must not outlive the child views that it references. The resulting column cannot be null.
      Parameters:
      columns - the columns to add to the struct in the order they should be added
      Returns:
      the new column view. It is the responsibility of the caller to close this.
    • fromDeviceBuffer

      public static ColumnView fromDeviceBuffer(BaseDeviceMemoryBuffer buffer, long startOffset, DType type, int rows)
      Create a new column view from a raw device buffer. Note that this will NOT copy the contents of the buffer but only creates a view. The view MUST NOT outlive the underlying device buffer. The column view will be created without a validity vector, so it is not possible to create a view containing null elements. Additionally only fixed-width primitive types are supported.
      Parameters:
      buffer - device memory that will back the column view
      startOffset - byte offset into the device buffer where the column data starts
      type - type of data in the column view
      rows - number of data elements in the column view
      Returns:
      new column view instance that must not outlive the backing device buffer
    • listContains

      public final ColumnVector listContains(Scalar key)
      Create a column of bool values indicating whether the specified scalar is an element of each row of a list column. Output `column[i]` is set to null if one or more of the following are true: 1. The key is null 2. The column vector list value is null
      Parameters:
      key - the scalar to look up
      Returns:
      a Boolean ColumnVector with the result of the lookup
    • listContainsColumn

      public final ColumnVector listContainsColumn(ColumnView key)
      Create a column of bool values indicating whether the list rows of the first column contain the corresponding values in the second column. Output `column[i]` is set to null if one or more of the following are true: 1. The key value is null 2. The column vector list value is null
      Parameters:
      key - the ColumnVector with look up values
      Returns:
      a Boolean ColumnVector with the result of the lookup
    • listContainsNulls

      public final ColumnVector listContainsNulls()
      Create a column of bool values indicating whether the list rows of the specified column contain null elements. Output `column[i]` is set to null iff the input list row is null.
      Returns:
      a Boolean ColumnVector with the result of the lookup
    • listIndexOf

      public final ColumnVector listIndexOf(Scalar key, ColumnView.FindOptions findOption)
      Create a column of int32 indices, indicating the position of the scalar search key in each list row. All indices are 0-based. If a search key is not found, the index is set to -1. The index is set to null if one of the following is true: 1. The search key is null. 2. The list row is null.
      Parameters:
      key - The scalar search key
      findOption - Whether to find the first index of the key, or the last.
      Returns:
      The resultant column of int32 indices
    • listIndexOf

      public final ColumnVector listIndexOf(ColumnView keys, ColumnView.FindOptions findOption)
      Create a column of int32 indices, indicating the position of each row in the search key column in the corresponding row of the lists column. All indices are 0-based. If a search key is not found, the index is set to -1. The index is set to null if one of the following is true: 1. The search key row is null. 2. The list row is null.
      Parameters:
      keys - ColumnView of search keys.
      findOption - Whether to find the first index of the key, or the last.
      Returns:
      The resultant column of int32 indices
    • listSortRows

      public final ColumnVector listSortRows(boolean isDescending, boolean isNullSmallest)
      Segmented sort of the elements within a list in each row of a list column. NOTICE: list columns with nested child are NOT supported yet.
      Parameters:
      isDescending - whether sorting each row with descending order (or ascending order)
      isNullSmallest - whether to regard the null value as the min value (or the max value)
      Returns:
      a List ColumnVector with elements in each list sorted
    • listsHaveOverlap

      public static ColumnVector listsHaveOverlap(ColumnView lhs, ColumnView rhs)
      For each pair of lists from the input lists columns, check if they have any common non-null elements. A null input row in any of the input columns will result in a null output row. During checking for common elements, nulls within each list are considered as different values while floating-point NaN values are considered as equal. The input lists columns must have the same size and same data type.
      Parameters:
      lhs - The input lists column for one side
      rhs - The input lists column for the other side
      Returns:
      A column of type BOOL8 containing the check result
    • listsIntersectDistinct

      public static ColumnVector listsIntersectDistinct(ColumnView lhs, ColumnView rhs)
      Find the intersection without duplicate between lists at each row of the given lists columns. A null input row in any of the input lists columns will result in a null output row. During finding list intersection, nulls and floating-point NaN values within each list are considered as equal values. The input lists columns must have the same size and same data type.
      Parameters:
      lhs - The input lists column for one side
      rhs - The input lists column for the other side
      Returns:
      A lists column containing the intersection result
    • listsUnionDistinct

      public static ColumnVector listsUnionDistinct(ColumnView lhs, ColumnView rhs)
      Find the union without duplicate between lists at each row of the given lists columns. A null input row in any of the input lists columns will result in a null output row. During finding list union, nulls and floating-point NaN values within each list are considered as equal values. The input lists columns must have the same size and same data type.
      Parameters:
      lhs - The input lists column for one side
      rhs - The input lists column for the other side
      Returns:
      A lists column containing the union result
    • listsDifferenceDistinct

      public static ColumnVector listsDifferenceDistinct(ColumnView lhs, ColumnView rhs)
      Find the difference of lists of the left column against lists of the right column. Specifically, find the elements (without duplicates) from each list of the left column that do not exist in the corresponding list of the right column. A null input row in any of the input lists columns will result in a null output row. During finding, nulls and floating-point NaN values within each list are considered as equal values. The input lists columns must have the same size and same data type.
      Parameters:
      lhs - The input lists column for one side
      rhs - The input lists column for the other side
      Returns:
      A lists column containing the difference result
    • generateListOffsets

      public final ColumnVector generateListOffsets()
      Generate list offsets from sizes of each list. NOTICE: This API only works for INT32. Otherwise, the behavior is undefined. And no null and negative value is allowed.
      Returns:
      a column of list offsets whose size is N + 1
    • getScalarElement

      public final Scalar getScalarElement(int index)
      Get a single item from the column at the specified index as a Scalar. Be careful. This is expensive and may involve running a kernel to copy the data out.
      Parameters:
      index - the index to look at
      Returns:
      the value at that index as a scalar.
      Throws:
      CudfException - if the index is out of bounds.
    • applyBooleanMask

      public final ColumnVector applyBooleanMask(ColumnView booleanMaskView)
      Filters elements in each row of this LIST column using `booleanMaskView` LIST of booleans as a mask.

      Given a list-of-bools column, the function produces a new `LIST` column of the same type as this column, where each element is copied from the row *only* if the corresponding `boolean_mask` is non-null and `true`.

      E.g. column = { {0,1,2}, {3,4}, {5,6,7}, {8,9} }; boolean_mask = { {0,1,1}, {1,0}, {1,1,1}, {0,0} }; results = { {1,2}, {3}, {5,6,7}, {} };

      This column and `boolean_mask` must have the same number of rows. The output column has the same number of rows as this column. An element is copied to an output row *only* if the corresponding boolean_mask element is `true`. An output row is invalid only if the row is invalid.

      Parameters:
      booleanMaskView - A nullable list of bools column used to filter elements in this column
      Returns:
      List column of the same type as this column, containing filtered list rows
      Throws:
      CudfException - if `boolean_mask` is not a "lists of bools" column
      CudfException - if this column and `boolean_mask` have different number of rows
    • distinctCount

      public int distinctCount(NullPolicy nullPolicy)
      Count how many rows in the column are distinct from one another.
      Parameters:
      nullPolicy - if nulls should be included or not.
    • distinctCount

      public int distinctCount()
      Count how many rows in the column are distinct from one another. Nulls are included.
    • title

      protected static long title(long handle)
    • copyToHost

      public HostColumnVector copyToHost(HostMemoryAllocator hostMemoryAllocator)
      Copy the data to the host synchronously.
    • copyToHostAsync

      public HostColumnVector copyToHostAsync(Cuda.Stream stream, HostMemoryAllocator hostMemoryAllocator)
      Copy the data to the host asynchronously. The caller MUST synchronize on the stream before examining the result.
    • copyToHost

      public HostColumnVector copyToHost()
      Copy the data to host memory synchronously
    • copyToHostAsync

      public HostColumnVector copyToHostAsync(Cuda.Stream stream)
      Copy the data to the host asynchronously. The caller MUST synchronize on the stream before examining the result.
    • getHostBytesRequired

      public long getHostBytesRequired()
      Calculate the total space required to copy the data to the host. This should be padded to the alignment that the CPU requires.
    • hostPaddingSizeInBytes

      public static long hostPaddingSizeInBytes()
      Get the size that the host will align memory allocations to in bytes.
    • hasNonEmptyNulls

      public boolean hasNonEmptyNulls()
      Exact check if a column or its descendants have non-empty null rows
      Returns:
      Whether the column or its descendants have non-empty null rows
    • purgeNonEmptyNulls

      public ColumnVector purgeNonEmptyNulls()
      Copies this column into output while purging any non-empty null rows in the column or its descendants. If this column is not of compound type (LIST/STRING/STRUCT/DICTIONARY), the output will be the same as input. The purge operation only applies directly to LIST and STRING columns, but it applies indirectly to STRUCT/DICTIONARY columns as well, since these columns may have child columns that are LIST or STRING. Examples: lists = data: [{{0,1}, {2,3}, {4,5}} validity: {true, false, true}] lists[1] is null, but the list's child column still stores `{2,3}`. After purging the contents of the list's null rows, the column's contents will be: lists = [data: {{0,1}, {4,5}} validity: {true, false, true}]
      Returns:
      A new column with equivalent contents to `input`, but with null rows purged
    • toHex

      public ColumnVector toHex()
      Convert this integer column to hexadecimal column and return a new strings column Any null entries will result in corresponding null entries in the output column. The output character set is '0'-'9' and 'A'-'F'. The output string width will be a multiple of 2 depending on the size of the integer type. A single leading zero is applied to the first non-zero output byte if it is less than 0x10. Example: input = [123, -1, 0, 27, 342718233] s = input.toHex() s is [ '04D2', 'FFFFFFFF', '00', '1B', '146D7719'] The example above shows an `INT32` type column where each integer is 4 bytes. Leading zeros are suppressed unless filling out a complete byte as in `123 -> '04D2'` instead of `000004D2` or `4D2`.
      Returns:
      new string ColumnVector