Class ColumnVector
- All Implemented Interfaces:
BinaryOperable
,AutoCloseable
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interface
Interface to handle events for this ColumnVector.protected static final class
Holds the off heap state of the column vector so we can clean it up, even if it is leaked.Nested classes/interfaces inherited from class ai.rapids.cudf.ColumnView
ColumnView.FindOptions
-
Field Summary
Fields inherited from class ai.rapids.cudf.ColumnView
offHeap, rows, type, UNKNOWN_NULL_COUNT, viewHandle
-
Constructor Summary
ConstructorsConstructorDescriptionColumnVector
(long nativePointer) Wrap an existing on device cudf::column with the corresponding ColumnVector.ColumnVector
(DType type, long rows, Optional<Long> nullCount, DeviceMemoryBuffer dataBuffer, DeviceMemoryBuffer validityBuffer, DeviceMemoryBuffer offsetBuffer) Create a new column vector based off of data already on the device.ColumnVector
(DType type, long rows, Optional<Long> nullCount, DeviceMemoryBuffer dataBuffer, DeviceMemoryBuffer validityBuffer, DeviceMemoryBuffer offsetBuffer, List<DeviceMemoryBuffer> toClose, long[] childHandles) Create a new column vector based off of data already on the device with child columns. -
Method Summary
Modifier and TypeMethodDescriptionstatic ColumnVector
boolFromBytes
(byte... values) Create a new vector from the given values.static ColumnVector
build
(int rows, long stringBufferSize, Consumer<HostColumnVector.Builder> init) static ColumnVector
build
(DType type, int rows, Consumer<HostColumnVector.Builder> init) Create a new vector.Generic method to cast ColumnVector When casting from a Date, Timestamp, or Boolean to a numerical type the underlying numerical representation of the data will be used for the cast.void
close()
Close this Vector and free memory allocated for HostMemoryBuffer and DeviceMemoryBufferstatic ColumnVector
concatenate
(ColumnView... columns) Create a new vector by concatenating multiple columns together.For a ColumnVector this is really just incrementing the reference count.static ColumnVector
daysFromInts
(int... values) Create a new vector from the given values.static ColumnVector
decimalFromBigInt
(int scale, BigInteger... values) Create a new decimal vector from BigIntegers Compared with scale of [[java.math.BigDecimal]], the scale here represents the opposite meaning.static ColumnVector
decimalFromBoxedInts
(int scale, Integer... values) Create a new decimal vector from boxed unscaled values (Integer array) and scale.static ColumnVector
decimalFromBoxedLongs
(int scale, Long... values) Create a new decimal vector from boxed unscaled values (Long array) and scale.static ColumnVector
decimalFromDoubles
(DType type, RoundingMode mode, double... values) Create a new decimal vector from double floats with specific DecimalType and RoundingMode.static ColumnVector
decimalFromInts
(int scale, int... values) Create a new decimal vector from unscaled values (int array) and scale.static ColumnVector
decimalFromLongs
(int scale, long... values) Create a new decimal vector from unscaled values (long array) and scale.static ColumnVector
durationDaysFromBoxedInts
(Integer... values) Create a new vector from the given values.static ColumnVector
durationDaysFromInts
(int... values) Create a new vector from the given values.static ColumnVector
durationMicroSecondsFromBoxedLongs
(Long... values) Create a new vector from the given values.static ColumnVector
durationMicroSecondsFromLongs
(long... values) Create a new vector from the given values.static ColumnVector
durationMilliSecondsFromBoxedLongs
(Long... values) Create a new vector from the given values.static ColumnVector
durationMilliSecondsFromLongs
(long... values) Create a new vector from the given values.static ColumnVector
durationNanoSecondsFromBoxedLongs
(Long... values) Create a new vector from the given values.static ColumnVector
durationNanoSecondsFromLongs
(long... values) Create a new vector from the given values.static ColumnVector
durationSecondsFromBoxedLongs
(Long... values) Create a new vector from the given values.static ColumnVector
durationSecondsFromLongs
(long... values) Create a new vector from the given values.static ColumnVector
empty
(HostColumnVector.DataType colType) Creates an empty column according to the data type.static ColumnVector
emptyStructs
(HostColumnVector.DataType dataType, long numRows) This method is evolving, unstable and currently test only.static ColumnVector
fromArrow
(DType type, long numRows, long nullCount, ByteBuffer data, ByteBuffer validity, ByteBuffer offsets) Create a ColumnVector from the Apache Arrow byte buffers passed in.static ColumnVector
fromBooleans
(boolean... values) Create a new vector from the given values.static ColumnVector
fromBoxedBooleans
(Boolean... values) Create a new vector from the given values.static ColumnVector
fromBoxedBytes
(Byte... values) Create a new vector from the given values.static ColumnVector
fromBoxedDoubles
(Double... values) Create a new vector from the given values.static ColumnVector
fromBoxedFloats
(Float... values) Create a new vector from the given values.static ColumnVector
fromBoxedInts
(Integer... values) Create a new vector from the given values.static ColumnVector
fromBoxedLongs
(Long... values) Create a new vector from the given values.static ColumnVector
fromBoxedShorts
(Short... values) Create a new vector from the given values.static ColumnVector
fromBoxedUnsignedBytes
(Byte... values) Create a new vector from the given values.static ColumnVector
fromBoxedUnsignedInts
(Integer... values) Create a new vector from the given values.static ColumnVector
fromBoxedUnsignedLongs
(Long... values) Create a new vector from the given values.static ColumnVector
fromBoxedUnsignedShorts
(Short... values) Create a new vector from the given values.static ColumnVector
fromBytes
(byte... values) Create a new vector from the given values.static ColumnVector
fromDecimals
(BigDecimal... values) Create a new vector from the given values.static ColumnVector
fromDoubles
(double... values) Create a new vector from the given values.static ColumnVector
fromFloats
(float... values) Create a new vector from the given values.static ColumnVector
fromInts
(int... values) Create a new vector from the given values.static <T> ColumnVector
fromLists
(HostColumnVector.DataType dataType, List<T>... lists) This method is evolving, unstable and currently test only.static ColumnVector
fromLongs
(long... values) Create a new vector from the given values.static ColumnVector
fromScalar
(Scalar scalar, int rows) Create a new vector of length rows, where each row is filled with the Scalar's valuestatic ColumnVector
fromShorts
(short... values) Create a new vector from the given values.static ColumnVector
fromStrings
(String... values) Create a new string vector from the given values.static ColumnVector
fromStructs
(HostColumnVector.DataType dataType, HostColumnVector.StructData... lists) This method is evolving, unstable and currently test only.static ColumnVector
fromStructs
(HostColumnVector.DataType dataType, List<HostColumnVector.StructData> lists) This method is evolving, unstable and currently test only.static ColumnVector
fromUnsignedBytes
(byte... values) Create a new vector from the given values.static ColumnVector
fromUnsignedInts
(int... values) Create a new vector from the given values.static ColumnVector
fromUnsignedLongs
(long... values) Create a new vector from the given values.static ColumnVector
fromUnsignedShorts
(short... values) Create a new vector from the given values.static ColumnVector
fromUTF8Strings
(byte[]... values) Create a new string vector from the given values.static ColumnVector
fromViewWithContiguousAllocation
(long columnViewAddress, DeviceMemoryBuffer buffer) Creates a ColumnVector from a native column_view using a contiguous device allocation.getDeviceBufferFor
(BufferType type) Get access to the raw device buffer for this column.Returns the current event handler for this ColumnVector or null if no handler is associated.long
Returns the number of nulls in the data.int
Returns this column's current refcountboolean
hasNulls()
Returns if the vector has nulls.boolean
Returns if the vector has a validity vector allocated or not.Increment the reference count for this column.static ColumnVector
listConcatenateByRow
(boolean ignoreNull, ColumnView... columns) Concatenate columns of lists horizontally (row by row), combining a corresponding row from each column into a single list row of a new column.static ColumnVector
listConcatenateByRow
(ColumnView... columns) Concatenate columns of lists horizontally (row by row), combining a corresponding row from each column into a single list row of a new column.static ColumnVector
makeList
(long rows, DType type, ColumnView... columns) Create a LIST column from the given columns.static ColumnVector
makeList
(ColumnView... columns) Create a LIST column from the given columns.makeListFromOffsets
(long rows, ColumnView offsets) Create a LIST column from the current column and a given offsets column.static ColumnVector
makeStruct
(long rows, ColumnView... columns) Create a new struct vector made up of existing columns.static ColumnVector
makeStruct
(ColumnView... columns) Create a new struct vector made up of existing columns.static ColumnVector
md5Hash
(ColumnView... columns) Create a new vector containing the MD5 hash of each row in the table.void
This is a really ugly API, but it is possible that the lifecycle of a column of data may not have a clear lifecycle thanks to java and GC.static ColumnVector
sequence
(ColumnView start, ColumnView size) Create a list column in which each row is a sequence of values starting from a `start` value, incrementing by one, and its cardinality is specified by a `size` value.static ColumnVector
sequence
(ColumnView start, ColumnView size, ColumnView step) Create a list column in which each row is a sequence of values starting from a `start` value, incrementing by a `step` value, and its cardinality is specified by a `size` value.static ColumnVector
Create a new vector of length rows, starting at the initialValue and going by 1 each time.static ColumnVector
Create a new vector of length rows, starting at the initialValue and going by step each time.setEventHandler
(ColumnVector.EventHandler newHandler) Set an event handler for this vector.static ColumnVector
sha1Hash
(ColumnView... columns) Create a new column containing the Sha1 hash of each row in the table.static ColumnVector
stringConcatenate
(ColumnView[] columns) Concatenate columns of strings together, combining a corresponding row from each column into a single string row of a new column with no separator string inserted between each combined string and maintaining null values in combined rows.static ColumnVector
stringConcatenate
(ColumnView[] columns, ColumnView sepCol) Concatenate columns of strings together using a separator specified for each row and returns the result as a string column.static ColumnVector
stringConcatenate
(ColumnView[] columns, ColumnView sepCol, Scalar separatorNarep, Scalar colNarep, boolean separateNulls) Concatenate columns of strings together using a separator specified for each row and returns the result as a string column.static ColumnVector
stringConcatenate
(Scalar separator, Scalar narep, ColumnView[] columns) Concatenate columns of strings together, combining a corresponding row from each column into a single string row of a new column.static ColumnVector
stringConcatenate
(Scalar separator, Scalar narep, ColumnView[] columns, boolean separateNulls) Concatenate columns of strings together, combining a corresponding row from each column into a single string row of a new column.static ColumnVector
timestampDaysFromBoxedInts
(Integer... values) Create a new vector from the given values.static ColumnVector
timestampMicroSecondsFromBoxedLongs
(Long... values) Create a new vector from the given values.static ColumnVector
timestampMicroSecondsFromLongs
(long... values) Create a new vector from the given values.static ColumnVector
timestampMilliSecondsFromBoxedLongs
(Long... values) Create a new vector from the given values.static ColumnVector
timestampMilliSecondsFromLongs
(long... values) Create a new vector from the given values.static ColumnVector
timestampNanoSecondsFromBoxedLongs
(Long... values) Create a new vector from the given values.static ColumnVector
timestampNanoSecondsFromLongs
(long... values) Create a new vector from the given values.static ColumnVector
timestampSecondsFromBoxedLongs
(Long... values) Create a new vector from the given values.static ColumnVector
timestampSecondsFromLongs
(long... values) Create a new vector from the given values.toString()
Methods inherited from class ai.rapids.cudf.ColumnView
abs, addCalendricalMonths, addCalendricalMonths, all, all, any, any, applyBooleanMask, approxPercentile, approxPercentile, arccos, arccosh, arcsin, arcsinh, arctan, arctanh, asByteList, asByteList, asBytes, asDoubles, asFloats, asInts, asLongs, asShorts, asStrings, asStrings, asTimestamp, asTimestampDays, asTimestampDays, asTimestampMicroseconds, asTimestampMicroseconds, asTimestampMilliseconds, asTimestampMilliseconds, asTimestampNanoseconds, asTimestampNanoseconds, asTimestampSeconds, asTimestampSeconds, asUnsignedBytes, asUnsignedInts, asUnsignedLongs, asUnsignedShorts, binaryOp, bitCastTo, bitCount, bitInvert, capitalize, cbrt, ceil, clamp, clamp, codePoints, contains, contains, containsRe, containsRe, copyToHost, copyToHost, copyToHostAsync, copyToHostAsync, cos, cosh, countElements, dateTimeCeil, dateTimeFloor, dateTimeRound, day, dayOfYear, daysInMonth, distinctCount, distinctCount, dropListDuplicates, dropListDuplicates, dropListDuplicatesWithKeysValues, endsWith, exp, extractAllRecord, extractAllRecord, extractDateTimeComponent, extractListElement, extractListElement, extractRe, extractRe, findAndReplaceAll, flattenLists, flattenLists, floor, fromDeviceBuffer, generateListOffsets, getByteCount, getCharLengths, getChildColumnView, getChildColumnViews, getData, getDeviceMemorySize, getHostBytesRequired, getJSONObject, getJSONObject, getListOffsetsView, getMapKeyExistence, getMapKeyExistence, getMapValue, getMapValue, getNativeView, getNumChildren, getOffsets, getRowCount, getScalarElement, getType, getValid, hasNonEmptyNulls, hostPaddingSizeInBytes, hour, ifElse, ifElse, ifElse, ifElse, isFixedPoint, isFloat, isInteger, isInteger, isLeapYear, isNan, isNotNan, isNotNull, isNull, isTimestamp, joinStrings, lastDayOfMonth, like, listContains, listContainsColumn, listContainsNulls, listIndexOf, listIndexOf, listReduce, listReduce, listReduce, listsDifferenceDistinct, listsHaveOverlap, listsIntersectDistinct, listSortRows, listsUnionDistinct, log, log10, log2, logicalCastTo, lower, lstrip, lstrip, makeStructView, makeStructView, matchesRe, matchesRe, max, max, mean, mean, mergeAndSetValidity, min, min, minute, month, nansToNulls, normalizeNANsAndZeros, not, pad, pad, pad, prefixSum, product, product, purgeNonEmptyNulls, quantile, quarterOfYear, reduce, reduce, repeatStrings, repeatStrings, replaceChildrenWithViews, replaceListChild, replaceMultiRegex, replaceNulls, replaceNulls, replaceNulls, replaceRegex, replaceRegex, replaceRegex, replaceRegex, reverseStringsOrLists, rint, rollingWindow, round, round, round, round, rstrip, rstrip, scan, scan, scan, second, segmentedGather, segmentedGather, segmentedReduce, segmentedReduce, segmentedReduce, sin, sinh, slice, split, splitAsViews, sqrt, standardDeviation, standardDeviation, startsWith, stringConcatenateListElements, stringConcatenateListElements, stringConcatenateListElements, stringContains, stringContains, stringLocate, stringLocate, stringLocate, stringReplace, stringReplace, stringReplaceWithBackrefs, stringReplaceWithBackrefs, stringSplit, stringSplit, stringSplit, stringSplit, stringSplit, stringSplit, stringSplitRecord, stringSplitRecord, stringSplitRecord, stringSplitRecord, stringSplitRecord, stringSplitRecord, strip, strip, substring, substring, substring, subVector, subVector, sum, sum, sumOfSquares, sumOfSquares, tan, tanh, title, toHex, toTitle, transform, unaryOp, upper, urlDecode, urlEncode, variance, variance, weekDay, year, zfill
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface ai.rapids.cudf.BinaryOperable
add, add, and, and, arctan2, arctan2, bitAnd, bitAnd, bitOr, bitOr, bitXor, bitXor, div, div, equalTo, equalTo, equalToNullAware, equalToNullAware, floorDiv, floorDiv, greaterOrEqualTo, greaterOrEqualTo, greaterThan, greaterThan, lessOrEqualTo, lessOrEqualTo, lessThan, lessThan, log, log, maxNullAware, maxNullAware, minNullAware, minNullAware, mod, mod, mul, mul, notEqualTo, notEqualTo, notEqualToNullAware, notEqualToNullAware, or, or, pmod, pmod, pow, pow, shiftLeft, shiftLeft, shiftRight, shiftRight, shiftRightUnsigned, shiftRightUnsigned, sub, sub, trueDiv, trueDiv
-
Constructor Details
-
ColumnVector
public ColumnVector(long nativePointer) Wrap an existing on device cudf::column with the corresponding ColumnVector. The new ColumnVector takes ownership of the pointer and will free it when the ref count reaches zero.- Parameters:
nativePointer
- host address of the cudf::column object which will be owned by this instance.
-
ColumnVector
public ColumnVector(DType type, long rows, Optional<Long> nullCount, DeviceMemoryBuffer dataBuffer, DeviceMemoryBuffer validityBuffer, DeviceMemoryBuffer offsetBuffer) Create a new column vector based off of data already on the device.- Parameters:
type
- the type of the vectorrows
- the number of rows in this vector.nullCount
- the number of nulls in the dataset.dataBuffer
- the data stored on the device. The column vector takes ownership of the buffer. Do not use the buffer after calling this.validityBuffer
- an optional validity buffer. Must be provided if nullCount != 0. The column vector takes ownership of the buffer. Do not use the buffer after calling this.offsetBuffer
- a host buffer required for strings and string categories. The column vector takes ownership of the buffer. Do not use the buffer after calling this.
-
ColumnVector
public ColumnVector(DType type, long rows, Optional<Long> nullCount, DeviceMemoryBuffer dataBuffer, DeviceMemoryBuffer validityBuffer, DeviceMemoryBuffer offsetBuffer, List<DeviceMemoryBuffer> toClose, long[] childHandles) Create a new column vector based off of data already on the device with child columns.- Parameters:
type
- the type of the vector, typically a nested typerows
- the number of rows in this vector.nullCount
- the number of nulls in the dataset.dataBuffer
- the data stored on the device. The column vector takes ownership of the buffer. Do not use the buffer after calling this.validityBuffer
- an optional validity buffer. Must be provided if nullCount != 0. The column vector takes ownership of the buffer. Do not use the buffer after calling this.offsetBuffer
- a host buffer required for strings and string categories. The column vector takes ownership of the buffer. Do not use the buffer after calling this.toClose
- List of buffers to track and close once done, usually in case of childrenchildHandles
- array of longs for child column view handles.
-
-
Method Details
-
copyToColumnVector
For a ColumnVector this is really just incrementing the reference count.- Overrides:
copyToColumnVector
in classColumnView
- Returns:
- this
-
fromViewWithContiguousAllocation
public static ColumnVector fromViewWithContiguousAllocation(long columnViewAddress, DeviceMemoryBuffer buffer) Creates a ColumnVector from a native column_view using a contiguous device allocation.- Parameters:
columnViewAddress
- address of the native column_viewbuffer
- device buffer containing the data referenced by the column view
-
setEventHandler
Set an event handler for this vector. This method can be invoked with null to unset the handler.- Parameters:
newHandler
- - the EventHandler to use from this point forward- Returns:
- the prior event handler, or null if not set.
-
getEventHandler
Returns the current event handler for this ColumnVector or null if no handler is associated. -
noWarnLeakExpected
public void noWarnLeakExpected()This is a really ugly API, but it is possible that the lifecycle of a column of data may not have a clear lifecycle thanks to java and GC. This API informs the leak tracking code that this is expected for this column, and big scary warnings should not be printed when this happens. -
close
public void close()Close this Vector and free memory allocated for HostMemoryBuffer and DeviceMemoryBuffer- Specified by:
close
in interfaceAutoCloseable
- Overrides:
close
in classColumnView
-
toString
- Overrides:
toString
in classColumnView
-
incRefCount
Increment the reference count for this column. You need to call close on this to decrement the reference count again. -
getNullCount
public long getNullCount()Returns the number of nulls in the data. Note that this might end up being a very expensive operation because if the null count is not known it will be calculated.- Overrides:
getNullCount
in classColumnView
-
getRefCount
public int getRefCount()Returns this column's current refcount -
hasValidityVector
public boolean hasValidityVector()Returns if the vector has a validity vector allocated or not. -
hasNulls
public boolean hasNulls()Returns if the vector has nulls. Note that this might end up being a very expensive operation because if the null count is not known it will be calculated. -
getDeviceBufferFor
Get access to the raw device buffer for this column. This is intended to be used with a lot of caution. The lifetime of the buffer is tied to the lifetime of the column (Do not close the buffer, as the column will take care of it). Do not modify the contents of the buffer or it might negatively impact what happens on the column. The data must be on the device for this to work. Strings and string categories do not currently work because their underlying device layout is currently hidden.- Parameters:
type
- the type of buffer to get access to.- Returns:
- the underlying buffer or null if no buffer is associated with it for this column. Please note that if the column is empty there may be no buffers at all associated with the column.
-
fromArrow
public static ColumnVector fromArrow(DType type, long numRows, long nullCount, ByteBuffer data, ByteBuffer validity, ByteBuffer offsets) Create a ColumnVector from the Apache Arrow byte buffers passed in. Any of the buffers not used for that datatype should be set to null. The buffers are expected to be off heap buffers, but if they are not, it will handle copying them to direct byte buffers. This only supports primitive types. Strings, Decimals and nested types such as list and struct are not supported.- Parameters:
type
- - type of the columnnumRows
- - Number of rows in the arrow columnnullCount
- - Null countdata
- - ByteBuffer of the Arrow data buffervalidity
- - ByteBuffer of the Arrow validity bufferoffsets
- - ByteBuffer of the Arrow offsets buffer- Returns:
- - new ColumnVector
-
fromScalar
Create a new vector of length rows, where each row is filled with the Scalar's value- Parameters:
scalar
- - Scalar to use to fill rowsrows
- - Number of rows in the new ColumnVector- Returns:
- - new ColumnVector
-
makeStruct
Create a new struct vector made up of existing columns. Note that this will copy the contents of the input columns to make a new vector. If you only want to do a quick temporary computation you can use ColumnView.makeStructView.- Parameters:
columns
- the columns to make the struct from.- Returns:
- the new ColumnVector
-
makeStruct
Create a new struct vector made up of existing columns. Note that this will copy the contents of the input columns to make a new vector. If you only want to do a quick temporary computation you can use ColumnView.makeStructView.- Parameters:
rows
- the number of rows in the struct. Used for structs with no children.columns
- the columns to make the struct from.- Returns:
- the new ColumnVector
-
makeList
Create a LIST column from the given columns. Each list in the returned column will have the same number of entries in it as columns passed into this method. Be careful about the number of rows passed in as there are limits on the maximum output size supported for column lists.- Parameters:
columns
- the columns to make up the list column, in the order they will appear in the resulting lists.- Returns:
- the new LIST ColumnVector
-
makeList
Create a LIST column from the given columns. Each list in the returned column will have the same number of entries in it as columns passed into this method. Be careful about the number of rows passed in as there are limits on the maximum output size supported for column lists.- Parameters:
rows
- the number of rows to create, for the special case of an empty list.type
- the type of the child column, for the special case of an empty list.columns
- the columns to make up the list column, in the order they will appear in the resulting lists.- Returns:
- the new LIST ColumnVector
-
makeListFromOffsets
Create a LIST column from the current column and a given offsets column. The output column will contain lists having elements that are copied from the current column and their sizes are determined by the given offsets. Note that the caller is responsible to make sure the given offsets column is of type INT32 and it contains valid indices to create a LIST column. There will not be any validity check for these offsets during calling to this function. If the given offsets are invalid, we may have bad memory accesses and/or data corruption.- Parameters:
rows
- the number of rows to create.offsets
- the offsets pointing to row indices of the current column to create an output LIST column.
-
sequence
Create a new vector of length rows, starting at the initialValue and going by step each time. Only numeric types are supported.- Parameters:
initialValue
- the initial value to start at.step
- the step to add to each subsequent row.rows
- the total number of rows- Returns:
- the new ColumnVector.
-
sequence
Create a new vector of length rows, starting at the initialValue and going by 1 each time. Only numeric types are supported.- Parameters:
initialValue
- the initial value to start at.rows
- the total number of rows- Returns:
- the new ColumnVector.
-
sequence
Create a list column in which each row is a sequence of values starting from a `start` value, incrementing by one, and its cardinality is specified by a `size` value. The `start` and `size` values used to generate each list is taken from the corresponding row of the input start and size columns.- Parameters:
start
- first values in the result sequencessize
- numbers of values in the result sequences- Returns:
- the new ColumnVector.
-
sequence
Create a list column in which each row is a sequence of values starting from a `start` value, incrementing by a `step` value, and its cardinality is specified by a `size` value. The values `start`, `step`, and `size` used to generate each list is taken from the corresponding row of the input starts, steps, and sizes columns.- Parameters:
start
- first values in the result sequencessize
- numbers of values in the result sequencesstep
- increment values for the result sequences.- Returns:
- the new ColumnVector.
-
concatenate
Create a new vector by concatenating multiple columns together. Note that all columns must have the same type. -
stringConcatenate
Concatenate columns of strings together, combining a corresponding row from each column into a single string row of a new column with no separator string inserted between each combined string and maintaining null values in combined rows.- Parameters:
columns
- array of columns containing strings, must be non-empty- Returns:
- A new java column vector containing the concatenated strings.
-
stringConcatenate
Concatenate columns of strings together, combining a corresponding row from each column into a single string row of a new column. This version includes the separator for null rows if 'narep' is valid.- Parameters:
separator
- string scalar inserted between each string being merged.narep
- string scalar indicating null behavior. If set to null and any string in the row is null the resulting string will be null. If not null, null values in any column will be replaced by the specified string.columns
- array of columns containing strings, must be non-empty- Returns:
- A new java column vector containing the concatenated strings.
-
stringConcatenate
public static ColumnVector stringConcatenate(Scalar separator, Scalar narep, ColumnView[] columns, boolean separateNulls) Concatenate columns of strings together, combining a corresponding row from each column into a single string row of a new column.- Parameters:
separator
- string scalar inserted between each string being merged.narep
- string scalar indicating null behavior. If set to null and any string in the row is null the resulting string will be null. If not null, null values in any column will be replaced by the specified string.columns
- array of columns containing strings, must be non-emptyseparateNulls
- if true, then the separator is included for null rows if `narep` is valid.- Returns:
- A new java column vector containing the concatenated strings.
-
stringConcatenate
Concatenate columns of strings together using a separator specified for each row and returns the result as a string column. If the row separator for a given row is null, output column for that row is null. Null column values for a given row are skipped.- Parameters:
columns
- array of columns containing stringssepCol
- strings column that provides the separator for a given row- Returns:
- A new java column vector containing the concatenated strings with separator between.
-
stringConcatenate
public static ColumnVector stringConcatenate(ColumnView[] columns, ColumnView sepCol, Scalar separatorNarep, Scalar colNarep, boolean separateNulls) Concatenate columns of strings together using a separator specified for each row and returns the result as a string column. If the row separator for a given row is null, output column for that row is null unless separatorNarep is provided. The separator is applied between two output row values if the separateNulls is `YES` or only between valid rows if separateNulls is `NO`.- Parameters:
columns
- array of columns containing stringssepCol
- strings column that provides the separator for a given rowseparatorNarep
- string scalar indicating null behavior when a separator is null. If set to null and the separator is null the resulting string will be null. If not null, this string will be used in place of a null separator.colNarep
- string that should be used in place of any null strings found in any column.separateNulls
- if true, then the separator is included for null rows if `colNarep` is valid.- Returns:
- A new java column vector containing the concatenated strings with separator between.
-
listConcatenateByRow
Concatenate columns of lists horizontally (row by row), combining a corresponding row from each column into a single list row of a new column. NOTICE: Any concatenation involving a null list element will result in a null list.- Parameters:
columns
- array of columns containing lists, must be non-empty- Returns:
- A new java column vector containing the concatenated lists.
-
listConcatenateByRow
Concatenate columns of lists horizontally (row by row), combining a corresponding row from each column into a single list row of a new column.- Parameters:
ignoreNull
- whether to ignore null list element of input columns: If true, null list will be ignored from concatenation; Otherwise, any concatenation involving a null list element will result in a null listcolumns
- array of columns containing lists, must be non-empty- Returns:
- A new java column vector containing the concatenated lists.
-
md5Hash
Create a new vector containing the MD5 hash of each row in the table.- Parameters:
columns
- array of columns to hash, must have identical number of rows.- Returns:
- the new ColumnVector of 32 character hex strings representing each row's hash value.
-
sha1Hash
Create a new column containing the Sha1 hash of each row in the table.- Parameters:
columns
- columns to hash- Returns:
- the new ColumnVector of 40 character hex strings representing each row's hash value.
-
castTo
Generic method to cast ColumnVector When casting from a Date, Timestamp, or Boolean to a numerical type the underlying numerical representation of the data will be used for the cast. For Strings: Casting strings from/to timestamp isn't supported atm. Please look atColumnView.asTimestamp(DType, String)
andColumnView.asStrings(String)
for casting string to timestamp when the format is known Float values when converted to String could be different from the expected default behavior in Java e.g. 12.3 => "12.30000019" instead of "12.3" Double.POSITIVE_INFINITY => "Inf" instead of "INFINITY" Double.NEGATIVE_INFINITY => "-Inf" instead of "-INFINITY"- Overrides:
castTo
in classColumnView
- Parameters:
type
- type of the resulting ColumnVector- Returns:
- A new vector allocated on the GPU
-
build
Create a new vector.- Parameters:
type
- the type of vector to build.rows
- maximum number of rows that the vector can hold.init
- what will initialize the vector.- Returns:
- the created vector.
-
build
public static ColumnVector build(int rows, long stringBufferSize, Consumer<HostColumnVector.Builder> init) -
boolFromBytes
Create a new vector from the given values. -
fromLists
This method is evolving, unstable and currently test only. Please use with caution and expect it to change in the future. -
fromStructs
public static ColumnVector fromStructs(HostColumnVector.DataType dataType, List<HostColumnVector.StructData> lists) This method is evolving, unstable and currently test only. Please use with caution and expect it to change in the future. -
fromStructs
public static ColumnVector fromStructs(HostColumnVector.DataType dataType, HostColumnVector.StructData... lists) This method is evolving, unstable and currently test only. Please use with caution and expect it to change in the future. -
emptyStructs
This method is evolving, unstable and currently test only. Please use with caution and expect it to change in the future. -
fromBooleans
Create a new vector from the given values. -
fromBytes
Create a new vector from the given values. -
fromUnsignedBytes
Create a new vector from the given values.Java does not have an unsigned byte type, so the values will be treated as if the bits represent an unsigned value.
-
fromShorts
Create a new vector from the given values. -
fromUnsignedShorts
Create a new vector from the given values.Java does not have an unsigned short type, so the values will be treated as if the bits represent an unsigned value.
-
fromInts
Create a new vector from the given values. -
fromUnsignedInts
Create a new vector from the given values.Java does not have an unsigned int type, so the values will be treated as if the bits represent an unsigned value.
-
fromLongs
Create a new vector from the given values. -
fromUnsignedLongs
Create a new vector from the given values.Java does not have an unsigned long type, so the values will be treated as if the bits represent an unsigned value.
-
fromFloats
Create a new vector from the given values. -
fromDoubles
Create a new vector from the given values. -
daysFromInts
Create a new vector from the given values. -
durationSecondsFromLongs
Create a new vector from the given values. -
timestampSecondsFromLongs
Create a new vector from the given values. -
durationDaysFromInts
Create a new vector from the given values. -
durationMilliSecondsFromLongs
Create a new vector from the given values. -
timestampMilliSecondsFromLongs
Create a new vector from the given values. -
durationMicroSecondsFromLongs
Create a new vector from the given values. -
timestampMicroSecondsFromLongs
Create a new vector from the given values. -
durationNanoSecondsFromLongs
Create a new vector from the given values. -
timestampNanoSecondsFromLongs
Create a new vector from the given values. -
decimalFromInts
Create a new decimal vector from unscaled values (int array) and scale. The created vector is of type DType.DECIMAL32, whose max precision is 9. Compared with scale of [[java.math.BigDecimal]], the scale here represents the opposite meaning. -
decimalFromBoxedInts
Create a new decimal vector from boxed unscaled values (Integer array) and scale. The created vector is of type DType.DECIMAL32, whose max precision is 9. Compared with scale of [[java.math.BigDecimal]], the scale here represents the opposite meaning. -
decimalFromLongs
Create a new decimal vector from unscaled values (long array) and scale. The created vector is of type DType.DECIMAL64, whose max precision is 18. Compared with scale of [[java.math.BigDecimal]], the scale here represents the opposite meaning. -
decimalFromBoxedLongs
Create a new decimal vector from boxed unscaled values (Long array) and scale. The created vector is of type DType.DECIMAL64, whose max precision is 18. Compared with scale of [[java.math.BigDecimal]], the scale here represents the opposite meaning. -
decimalFromDoubles
Create a new decimal vector from double floats with specific DecimalType and RoundingMode. All doubles will be rescaled if necessary, according to scale of input DecimalType and RoundingMode. If any overflow occurs in extracting integral part, an IllegalArgumentException will be thrown. This API is inefficient because of slow double -> decimal conversion, so it is mainly for testing. Compared with scale of [[java.math.BigDecimal]], the scale here represents the opposite meaning. -
decimalFromBigInt
Create a new decimal vector from BigIntegers Compared with scale of [[java.math.BigDecimal]], the scale here represents the opposite meaning. -
fromStrings
Create a new string vector from the given values. This API supports inline nulls. This is really intended to be used only for testing as it is slow and memory intensive to translate between java strings and UTF8 strings. -
fromUTF8Strings
Create a new string vector from the given values. This API supports inline nulls. -
fromDecimals
Create a new vector from the given values. This API supports inline nulls, but is much slower than building from primitive array of unscaledValues. Notice: 1. All input BigDecimals should share same scale. 2. The scale will be zero if all input values are null. -
fromBoxedBooleans
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
fromBoxedBytes
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
fromBoxedUnsignedBytes
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.Java does not have an unsigned byte type, so the values will be treated as if the bits represent an unsigned value.
-
fromBoxedShorts
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
fromBoxedUnsignedShorts
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.Java does not have an unsigned short type, so the values will be treated as if the bits represent an unsigned value.
-
fromBoxedInts
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
fromBoxedUnsignedInts
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.Java does not have an unsigned int type, so the values will be treated as if the bits represent an unsigned value.
-
fromBoxedLongs
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
fromBoxedUnsignedLongs
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.Java does not have an unsigned long type, so the values will be treated as if the bits represent an unsigned value.
-
fromBoxedFloats
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
fromBoxedDoubles
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
timestampDaysFromBoxedInts
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
durationDaysFromBoxedInts
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
durationSecondsFromBoxedLongs
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
timestampSecondsFromBoxedLongs
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
durationMilliSecondsFromBoxedLongs
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
timestampMilliSecondsFromBoxedLongs
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
durationMicroSecondsFromBoxedLongs
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
timestampMicroSecondsFromBoxedLongs
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
durationNanoSecondsFromBoxedLongs
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
timestampNanoSecondsFromBoxedLongs
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
empty
Creates an empty column according to the data type. It will create all the nested columns by iterating all the children in the input type object 'colType'. The performance is not good, so use it carefully. We may want to move this implementation to the native once figuring out a way to pass the nested data type to the native.- Parameters:
colType
- the data type of the empty column- Returns:
- an empty ColumnVector with its children. Each children contains zero elements. Users should close the ColumnVector to avoid memory leak.
-