Class ColumnVector
- All Implemented Interfaces:
BinaryOperable,AutoCloseable
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interfaceInterface to handle events for this ColumnVector.protected static final classHolds the off heap state of the column vector so we can clean it up, even if it is leaked.Nested classes/interfaces inherited from class ai.rapids.cudf.ColumnView
ColumnView.FindOptions -
Field Summary
Fields inherited from class ai.rapids.cudf.ColumnView
offHeap, rows, type, UNKNOWN_NULL_COUNT, viewHandle -
Constructor Summary
ConstructorsConstructorDescriptionColumnVector(long nativePointer) Wrap an existing on device cudf::column with the corresponding ColumnVector.ColumnVector(DType type, long rows, Optional<Long> nullCount, DeviceMemoryBuffer dataBuffer, DeviceMemoryBuffer validityBuffer, DeviceMemoryBuffer offsetBuffer) Create a new column vector based off of data already on the device.ColumnVector(DType type, long rows, Optional<Long> nullCount, DeviceMemoryBuffer dataBuffer, DeviceMemoryBuffer validityBuffer, DeviceMemoryBuffer offsetBuffer, List<DeviceMemoryBuffer> toClose, long[] childHandles) Create a new column vector based off of data already on the device with child columns. -
Method Summary
Modifier and TypeMethodDescriptionstatic ColumnVectorboolFromBytes(byte... values) Create a new vector from the given values.static ColumnVectorbuild(int rows, long stringBufferSize, Consumer<HostColumnVector.Builder> init) static ColumnVectorbuild(DType type, int rows, Consumer<HostColumnVector.Builder> init) Create a new vector.Generic method to cast ColumnVector When casting from a Date, Timestamp, or Boolean to a numerical type the underlying numerical representation of the data will be used for the cast.voidclose()Close this Vector and free memory allocated for HostMemoryBuffer and DeviceMemoryBufferstatic ColumnVectorconcatenate(ColumnView... columns) Create a new vector by concatenating multiple columns together.For a ColumnVector this is really just incrementing the reference count.static ColumnVectordaysFromInts(int... values) Create a new vector from the given values.static ColumnVectordecimalFromBigInt(int scale, BigInteger... values) Create a new decimal vector from BigIntegers Compared with scale of [[java.math.BigDecimal]], the scale here represents the opposite meaning.static ColumnVectordecimalFromBoxedInts(int scale, Integer... values) Create a new decimal vector from boxed unscaled values (Integer array) and scale.static ColumnVectordecimalFromBoxedLongs(int scale, Long... values) Create a new decimal vector from boxed unscaled values (Long array) and scale.static ColumnVectordecimalFromDoubles(DType type, RoundingMode mode, double... values) Create a new decimal vector from double floats with specific DecimalType and RoundingMode.static ColumnVectordecimalFromInts(int scale, int... values) Create a new decimal vector from unscaled values (int array) and scale.static ColumnVectordecimalFromLongs(int scale, long... values) Create a new decimal vector from unscaled values (long array) and scale.static ColumnVectordurationDaysFromBoxedInts(Integer... values) Create a new vector from the given values.static ColumnVectordurationDaysFromInts(int... values) Create a new vector from the given values.static ColumnVectordurationMicroSecondsFromBoxedLongs(Long... values) Create a new vector from the given values.static ColumnVectordurationMicroSecondsFromLongs(long... values) Create a new vector from the given values.static ColumnVectordurationMilliSecondsFromBoxedLongs(Long... values) Create a new vector from the given values.static ColumnVectordurationMilliSecondsFromLongs(long... values) Create a new vector from the given values.static ColumnVectordurationNanoSecondsFromBoxedLongs(Long... values) Create a new vector from the given values.static ColumnVectordurationNanoSecondsFromLongs(long... values) Create a new vector from the given values.static ColumnVectordurationSecondsFromBoxedLongs(Long... values) Create a new vector from the given values.static ColumnVectordurationSecondsFromLongs(long... values) Create a new vector from the given values.static ColumnVectorempty(HostColumnVector.DataType colType) Creates an empty column according to the data type.static ColumnVectoremptyStructs(HostColumnVector.DataType dataType, long numRows) This method is evolving, unstable and currently test only.static ColumnVectorfromArrow(DType type, long numRows, long nullCount, ByteBuffer data, ByteBuffer validity, ByteBuffer offsets) Create a ColumnVector from the Apache Arrow byte buffers passed in.static ColumnVectorfromBooleans(boolean... values) Create a new vector from the given values.static ColumnVectorfromBoxedBooleans(Boolean... values) Create a new vector from the given values.static ColumnVectorfromBoxedBytes(Byte... values) Create a new vector from the given values.static ColumnVectorfromBoxedDoubles(Double... values) Create a new vector from the given values.static ColumnVectorfromBoxedFloats(Float... values) Create a new vector from the given values.static ColumnVectorfromBoxedInts(Integer... values) Create a new vector from the given values.static ColumnVectorfromBoxedLongs(Long... values) Create a new vector from the given values.static ColumnVectorfromBoxedShorts(Short... values) Create a new vector from the given values.static ColumnVectorfromBoxedUnsignedBytes(Byte... values) Create a new vector from the given values.static ColumnVectorfromBoxedUnsignedInts(Integer... values) Create a new vector from the given values.static ColumnVectorfromBoxedUnsignedLongs(Long... values) Create a new vector from the given values.static ColumnVectorfromBoxedUnsignedShorts(Short... values) Create a new vector from the given values.static ColumnVectorfromBytes(byte... values) Create a new vector from the given values.static ColumnVectorfromDecimals(BigDecimal... values) Create a new vector from the given values.static ColumnVectorfromDoubles(double... values) Create a new vector from the given values.static ColumnVectorfromFloats(float... values) Create a new vector from the given values.static ColumnVectorfromInts(int... values) Create a new vector from the given values.static <T> ColumnVectorfromLists(HostColumnVector.DataType dataType, List<T>... lists) This method is evolving, unstable and currently test only.static ColumnVectorfromLongs(long... values) Create a new vector from the given values.static ColumnVectorfromScalar(Scalar scalar, int rows) Create a new vector of length rows, where each row is filled with the Scalar's valuestatic ColumnVectorfromShorts(short... values) Create a new vector from the given values.static ColumnVectorfromStrings(String... values) Create a new string vector from the given values.static ColumnVectorfromStructs(HostColumnVector.DataType dataType, HostColumnVector.StructData... lists) This method is evolving, unstable and currently test only.static ColumnVectorfromStructs(HostColumnVector.DataType dataType, List<HostColumnVector.StructData> lists) This method is evolving, unstable and currently test only.static ColumnVectorfromUnsignedBytes(byte... values) Create a new vector from the given values.static ColumnVectorfromUnsignedInts(int... values) Create a new vector from the given values.static ColumnVectorfromUnsignedLongs(long... values) Create a new vector from the given values.static ColumnVectorfromUnsignedShorts(short... values) Create a new vector from the given values.static ColumnVectorfromUTF8Strings(byte[]... values) Create a new string vector from the given values.static ColumnVectorfromViewWithContiguousAllocation(long columnViewAddress, DeviceMemoryBuffer buffer) Creates a ColumnVector from a native column_view using a contiguous device allocation.getDeviceBufferFor(BufferType type) Get access to the raw device buffer for this column.Returns the current event handler for this ColumnVector or null if no handler is associated.longReturns the number of nulls in the data.intReturns this column's current refcountbooleanhasNulls()Returns if the vector has nulls.booleanReturns if the vector has a validity vector allocated or not.Increment the reference count for this column.static ColumnVectorlistConcatenateByRow(boolean ignoreNull, ColumnView... columns) Concatenate columns of lists horizontally (row by row), combining a corresponding row from each column into a single list row of a new column.static ColumnVectorlistConcatenateByRow(ColumnView... columns) Concatenate columns of lists horizontally (row by row), combining a corresponding row from each column into a single list row of a new column.static ColumnVectormakeList(long rows, DType type, ColumnView... columns) Create a LIST column from the given columns.static ColumnVectormakeList(ColumnView... columns) Create a LIST column from the given columns.makeListFromOffsets(long rows, ColumnView offsets) Create a LIST column from the current column and a given offsets column.static ColumnVectormakeStruct(long rows, ColumnView... columns) Create a new struct vector made up of existing columns.static ColumnVectormakeStruct(ColumnView... columns) Create a new struct vector made up of existing columns.static ColumnVectormd5Hash(ColumnView... columns) Create a new vector containing the MD5 hash of each row in the table.voidThis is a really ugly API, but it is possible that the lifecycle of a column of data may not have a clear lifecycle thanks to java and GC.static ColumnVectorsequence(ColumnView start, ColumnView size) Create a list column in which each row is a sequence of values starting from a `start` value, incrementing by one, and its cardinality is specified by a `size` value.static ColumnVectorsequence(ColumnView start, ColumnView size, ColumnView step) Create a list column in which each row is a sequence of values starting from a `start` value, incrementing by a `step` value, and its cardinality is specified by a `size` value.static ColumnVectorCreate a new vector of length rows, starting at the initialValue and going by 1 each time.static ColumnVectorCreate a new vector of length rows, starting at the initialValue and going by step each time.setEventHandler(ColumnVector.EventHandler newHandler) Set an event handler for this vector.static ColumnVectorsha1Hash(ColumnView... columns) Create a new column containing the Sha1 hash of each row in the table.static ColumnVectorstringConcatenate(ColumnView[] columns) Concatenate columns of strings together, combining a corresponding row from each column into a single string row of a new column with no separator string inserted between each combined string and maintaining null values in combined rows.static ColumnVectorstringConcatenate(ColumnView[] columns, ColumnView sepCol) Concatenate columns of strings together using a separator specified for each row and returns the result as a string column.static ColumnVectorstringConcatenate(ColumnView[] columns, ColumnView sepCol, Scalar separatorNarep, Scalar colNarep, boolean separateNulls) Concatenate columns of strings together using a separator specified for each row and returns the result as a string column.static ColumnVectorstringConcatenate(Scalar separator, Scalar narep, ColumnView[] columns) Concatenate columns of strings together, combining a corresponding row from each column into a single string row of a new column.static ColumnVectorstringConcatenate(Scalar separator, Scalar narep, ColumnView[] columns, boolean separateNulls) Concatenate columns of strings together, combining a corresponding row from each column into a single string row of a new column.static ColumnVectortimestampDaysFromBoxedInts(Integer... values) Create a new vector from the given values.static ColumnVectortimestampMicroSecondsFromBoxedLongs(Long... values) Create a new vector from the given values.static ColumnVectortimestampMicroSecondsFromLongs(long... values) Create a new vector from the given values.static ColumnVectortimestampMilliSecondsFromBoxedLongs(Long... values) Create a new vector from the given values.static ColumnVectortimestampMilliSecondsFromLongs(long... values) Create a new vector from the given values.static ColumnVectortimestampNanoSecondsFromBoxedLongs(Long... values) Create a new vector from the given values.static ColumnVectortimestampNanoSecondsFromLongs(long... values) Create a new vector from the given values.static ColumnVectortimestampSecondsFromBoxedLongs(Long... values) Create a new vector from the given values.static ColumnVectortimestampSecondsFromLongs(long... values) Create a new vector from the given values.toString()Methods inherited from class ai.rapids.cudf.ColumnView
abs, addCalendricalMonths, addCalendricalMonths, all, all, any, any, applyBooleanMask, approxPercentile, approxPercentile, arccos, arccosh, arcsin, arcsinh, arctan, arctanh, asByteList, asByteList, asBytes, asDoubles, asFloats, asInts, asLongs, asShorts, asStrings, asStrings, asTimestamp, asTimestampDays, asTimestampDays, asTimestampMicroseconds, asTimestampMicroseconds, asTimestampMilliseconds, asTimestampMilliseconds, asTimestampNanoseconds, asTimestampNanoseconds, asTimestampSeconds, asTimestampSeconds, asUnsignedBytes, asUnsignedInts, asUnsignedLongs, asUnsignedShorts, binaryOp, bitCastTo, bitCount, bitInvert, capitalize, cbrt, ceil, clamp, clamp, codePoints, contains, contains, containsRe, containsRe, copyToHost, copyToHost, copyToHostAsync, copyToHostAsync, cos, cosh, countElements, dateTimeCeil, dateTimeFloor, dateTimeRound, day, dayOfYear, daysInMonth, distinctCount, distinctCount, dropListDuplicates, dropListDuplicates, dropListDuplicatesWithKeysValues, endsWith, exp, extractAllRecord, extractAllRecord, extractDateTimeComponent, extractListElement, extractListElement, extractRe, extractRe, findAndReplaceAll, flattenLists, flattenLists, floor, fromDeviceBuffer, generateListOffsets, getByteCount, getCharLengths, getChildColumnView, getChildColumnViews, getData, getDeviceMemorySize, getHostBytesRequired, getJSONObject, getJSONObject, getListOffsetsView, getMapKeyExistence, getMapKeyExistence, getMapValue, getMapValue, getNativeView, getNumChildren, getOffsets, getRowCount, getScalarElement, getType, getValid, hasNonEmptyNulls, hostPaddingSizeInBytes, hour, ifElse, ifElse, ifElse, ifElse, isFixedPoint, isFloat, isInteger, isInteger, isLeapYear, isNan, isNotNan, isNotNull, isNull, isTimestamp, joinStrings, lastDayOfMonth, like, listContains, listContainsColumn, listContainsNulls, listIndexOf, listIndexOf, listReduce, listReduce, listReduce, listsDifferenceDistinct, listsHaveOverlap, listsIntersectDistinct, listSortRows, listsUnionDistinct, log, log10, log2, logicalCastTo, lower, lstrip, lstrip, makeStructView, makeStructView, matchesRe, matchesRe, max, max, mean, mean, mergeAndSetValidity, min, min, minute, month, nansToNulls, normalizeNANsAndZeros, not, pad, pad, pad, prefixSum, product, product, purgeNonEmptyNulls, quantile, quarterOfYear, reduce, reduce, repeatStrings, repeatStrings, replaceChildrenWithViews, replaceListChild, replaceMultiRegex, replaceNulls, replaceNulls, replaceNulls, replaceRegex, replaceRegex, replaceRegex, replaceRegex, reverseStringsOrLists, rint, rollingWindow, round, round, round, round, rstrip, rstrip, scan, scan, scan, second, segmentedGather, segmentedGather, segmentedReduce, segmentedReduce, segmentedReduce, sin, sinh, slice, split, splitAsViews, sqrt, standardDeviation, standardDeviation, startsWith, stringConcatenateListElements, stringConcatenateListElements, stringConcatenateListElements, stringContains, stringContains, stringLocate, stringLocate, stringLocate, stringReplace, stringReplace, stringReplaceWithBackrefs, stringReplaceWithBackrefs, stringSplit, stringSplit, stringSplit, stringSplit, stringSplit, stringSplit, stringSplitRecord, stringSplitRecord, stringSplitRecord, stringSplitRecord, stringSplitRecord, stringSplitRecord, strip, strip, substring, substring, substring, subVector, subVector, sum, sum, sumOfSquares, sumOfSquares, tan, tanh, title, toHex, toTitle, transform, unaryOp, upper, urlDecode, urlEncode, variance, variance, weekDay, year, zfillMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface ai.rapids.cudf.BinaryOperable
add, add, and, and, arctan2, arctan2, bitAnd, bitAnd, bitOr, bitOr, bitXor, bitXor, div, div, equalTo, equalTo, equalToNullAware, equalToNullAware, floorDiv, floorDiv, greaterOrEqualTo, greaterOrEqualTo, greaterThan, greaterThan, lessOrEqualTo, lessOrEqualTo, lessThan, lessThan, log, log, maxNullAware, maxNullAware, minNullAware, minNullAware, mod, mod, mul, mul, notEqualTo, notEqualTo, notEqualToNullAware, notEqualToNullAware, or, or, pmod, pmod, pow, pow, shiftLeft, shiftLeft, shiftRight, shiftRight, shiftRightUnsigned, shiftRightUnsigned, sub, sub, trueDiv, trueDiv
-
Constructor Details
-
ColumnVector
public ColumnVector(long nativePointer) Wrap an existing on device cudf::column with the corresponding ColumnVector. The new ColumnVector takes ownership of the pointer and will free it when the ref count reaches zero.- Parameters:
nativePointer- host address of the cudf::column object which will be owned by this instance.
-
ColumnVector
public ColumnVector(DType type, long rows, Optional<Long> nullCount, DeviceMemoryBuffer dataBuffer, DeviceMemoryBuffer validityBuffer, DeviceMemoryBuffer offsetBuffer) Create a new column vector based off of data already on the device.- Parameters:
type- the type of the vectorrows- the number of rows in this vector.nullCount- the number of nulls in the dataset.dataBuffer- the data stored on the device. The column vector takes ownership of the buffer. Do not use the buffer after calling this.validityBuffer- an optional validity buffer. Must be provided if nullCount != 0. The column vector takes ownership of the buffer. Do not use the buffer after calling this.offsetBuffer- a host buffer required for strings and string categories. The column vector takes ownership of the buffer. Do not use the buffer after calling this.
-
ColumnVector
public ColumnVector(DType type, long rows, Optional<Long> nullCount, DeviceMemoryBuffer dataBuffer, DeviceMemoryBuffer validityBuffer, DeviceMemoryBuffer offsetBuffer, List<DeviceMemoryBuffer> toClose, long[] childHandles) Create a new column vector based off of data already on the device with child columns.- Parameters:
type- the type of the vector, typically a nested typerows- the number of rows in this vector.nullCount- the number of nulls in the dataset.dataBuffer- the data stored on the device. The column vector takes ownership of the buffer. Do not use the buffer after calling this.validityBuffer- an optional validity buffer. Must be provided if nullCount != 0. The column vector takes ownership of the buffer. Do not use the buffer after calling this.offsetBuffer- a host buffer required for strings and string categories. The column vector takes ownership of the buffer. Do not use the buffer after calling this.toClose- List of buffers to track and close once done, usually in case of childrenchildHandles- array of longs for child column view handles.
-
-
Method Details
-
copyToColumnVector
For a ColumnVector this is really just incrementing the reference count.- Overrides:
copyToColumnVectorin classColumnView- Returns:
- this
-
fromViewWithContiguousAllocation
public static ColumnVector fromViewWithContiguousAllocation(long columnViewAddress, DeviceMemoryBuffer buffer) Creates a ColumnVector from a native column_view using a contiguous device allocation.- Parameters:
columnViewAddress- address of the native column_viewbuffer- device buffer containing the data referenced by the column view
-
setEventHandler
Set an event handler for this vector. This method can be invoked with null to unset the handler.- Parameters:
newHandler- - the EventHandler to use from this point forward- Returns:
- the prior event handler, or null if not set.
-
getEventHandler
Returns the current event handler for this ColumnVector or null if no handler is associated. -
noWarnLeakExpected
public void noWarnLeakExpected()This is a really ugly API, but it is possible that the lifecycle of a column of data may not have a clear lifecycle thanks to java and GC. This API informs the leak tracking code that this is expected for this column, and big scary warnings should not be printed when this happens. -
close
public void close()Close this Vector and free memory allocated for HostMemoryBuffer and DeviceMemoryBuffer- Specified by:
closein interfaceAutoCloseable- Overrides:
closein classColumnView
-
toString
- Overrides:
toStringin classColumnView
-
incRefCount
Increment the reference count for this column. You need to call close on this to decrement the reference count again. -
getNullCount
public long getNullCount()Returns the number of nulls in the data. Note that this might end up being a very expensive operation because if the null count is not known it will be calculated.- Overrides:
getNullCountin classColumnView
-
getRefCount
public int getRefCount()Returns this column's current refcount -
hasValidityVector
public boolean hasValidityVector()Returns if the vector has a validity vector allocated or not. -
hasNulls
public boolean hasNulls()Returns if the vector has nulls. Note that this might end up being a very expensive operation because if the null count is not known it will be calculated. -
getDeviceBufferFor
Get access to the raw device buffer for this column. This is intended to be used with a lot of caution. The lifetime of the buffer is tied to the lifetime of the column (Do not close the buffer, as the column will take care of it). Do not modify the contents of the buffer or it might negatively impact what happens on the column. The data must be on the device for this to work. Strings and string categories do not currently work because their underlying device layout is currently hidden.- Parameters:
type- the type of buffer to get access to.- Returns:
- the underlying buffer or null if no buffer is associated with it for this column. Please note that if the column is empty there may be no buffers at all associated with the column.
-
fromArrow
public static ColumnVector fromArrow(DType type, long numRows, long nullCount, ByteBuffer data, ByteBuffer validity, ByteBuffer offsets) Create a ColumnVector from the Apache Arrow byte buffers passed in. Any of the buffers not used for that datatype should be set to null. The buffers are expected to be off heap buffers, but if they are not, it will handle copying them to direct byte buffers. This only supports primitive types. Strings, Decimals and nested types such as list and struct are not supported.- Parameters:
type- - type of the columnnumRows- - Number of rows in the arrow columnnullCount- - Null countdata- - ByteBuffer of the Arrow data buffervalidity- - ByteBuffer of the Arrow validity bufferoffsets- - ByteBuffer of the Arrow offsets buffer- Returns:
- - new ColumnVector
-
fromScalar
Create a new vector of length rows, where each row is filled with the Scalar's value- Parameters:
scalar- - Scalar to use to fill rowsrows- - Number of rows in the new ColumnVector- Returns:
- - new ColumnVector
-
makeStruct
Create a new struct vector made up of existing columns. Note that this will copy the contents of the input columns to make a new vector. If you only want to do a quick temporary computation you can use ColumnView.makeStructView.- Parameters:
columns- the columns to make the struct from.- Returns:
- the new ColumnVector
-
makeStruct
Create a new struct vector made up of existing columns. Note that this will copy the contents of the input columns to make a new vector. If you only want to do a quick temporary computation you can use ColumnView.makeStructView.- Parameters:
rows- the number of rows in the struct. Used for structs with no children.columns- the columns to make the struct from.- Returns:
- the new ColumnVector
-
makeList
Create a LIST column from the given columns. Each list in the returned column will have the same number of entries in it as columns passed into this method. Be careful about the number of rows passed in as there are limits on the maximum output size supported for column lists.- Parameters:
columns- the columns to make up the list column, in the order they will appear in the resulting lists.- Returns:
- the new LIST ColumnVector
-
makeList
Create a LIST column from the given columns. Each list in the returned column will have the same number of entries in it as columns passed into this method. Be careful about the number of rows passed in as there are limits on the maximum output size supported for column lists.- Parameters:
rows- the number of rows to create, for the special case of an empty list.type- the type of the child column, for the special case of an empty list.columns- the columns to make up the list column, in the order they will appear in the resulting lists.- Returns:
- the new LIST ColumnVector
-
makeListFromOffsets
Create a LIST column from the current column and a given offsets column. The output column will contain lists having elements that are copied from the current column and their sizes are determined by the given offsets. Note that the caller is responsible to make sure the given offsets column is of type INT32 and it contains valid indices to create a LIST column. There will not be any validity check for these offsets during calling to this function. If the given offsets are invalid, we may have bad memory accesses and/or data corruption.- Parameters:
rows- the number of rows to create.offsets- the offsets pointing to row indices of the current column to create an output LIST column.
-
sequence
Create a new vector of length rows, starting at the initialValue and going by step each time. Only numeric types are supported.- Parameters:
initialValue- the initial value to start at.step- the step to add to each subsequent row.rows- the total number of rows- Returns:
- the new ColumnVector.
-
sequence
Create a new vector of length rows, starting at the initialValue and going by 1 each time. Only numeric types are supported.- Parameters:
initialValue- the initial value to start at.rows- the total number of rows- Returns:
- the new ColumnVector.
-
sequence
Create a list column in which each row is a sequence of values starting from a `start` value, incrementing by one, and its cardinality is specified by a `size` value. The `start` and `size` values used to generate each list is taken from the corresponding row of the input start and size columns.- Parameters:
start- first values in the result sequencessize- numbers of values in the result sequences- Returns:
- the new ColumnVector.
-
sequence
Create a list column in which each row is a sequence of values starting from a `start` value, incrementing by a `step` value, and its cardinality is specified by a `size` value. The values `start`, `step`, and `size` used to generate each list is taken from the corresponding row of the input starts, steps, and sizes columns.- Parameters:
start- first values in the result sequencessize- numbers of values in the result sequencesstep- increment values for the result sequences.- Returns:
- the new ColumnVector.
-
concatenate
Create a new vector by concatenating multiple columns together. Note that all columns must have the same type. -
stringConcatenate
Concatenate columns of strings together, combining a corresponding row from each column into a single string row of a new column with no separator string inserted between each combined string and maintaining null values in combined rows.- Parameters:
columns- array of columns containing strings, must be non-empty- Returns:
- A new java column vector containing the concatenated strings.
-
stringConcatenate
Concatenate columns of strings together, combining a corresponding row from each column into a single string row of a new column. This version includes the separator for null rows if 'narep' is valid.- Parameters:
separator- string scalar inserted between each string being merged.narep- string scalar indicating null behavior. If set to null and any string in the row is null the resulting string will be null. If not null, null values in any column will be replaced by the specified string.columns- array of columns containing strings, must be non-empty- Returns:
- A new java column vector containing the concatenated strings.
-
stringConcatenate
public static ColumnVector stringConcatenate(Scalar separator, Scalar narep, ColumnView[] columns, boolean separateNulls) Concatenate columns of strings together, combining a corresponding row from each column into a single string row of a new column.- Parameters:
separator- string scalar inserted between each string being merged.narep- string scalar indicating null behavior. If set to null and any string in the row is null the resulting string will be null. If not null, null values in any column will be replaced by the specified string.columns- array of columns containing strings, must be non-emptyseparateNulls- if true, then the separator is included for null rows if `narep` is valid.- Returns:
- A new java column vector containing the concatenated strings.
-
stringConcatenate
Concatenate columns of strings together using a separator specified for each row and returns the result as a string column. If the row separator for a given row is null, output column for that row is null. Null column values for a given row are skipped.- Parameters:
columns- array of columns containing stringssepCol- strings column that provides the separator for a given row- Returns:
- A new java column vector containing the concatenated strings with separator between.
-
stringConcatenate
public static ColumnVector stringConcatenate(ColumnView[] columns, ColumnView sepCol, Scalar separatorNarep, Scalar colNarep, boolean separateNulls) Concatenate columns of strings together using a separator specified for each row and returns the result as a string column. If the row separator for a given row is null, output column for that row is null unless separatorNarep is provided. The separator is applied between two output row values if the separateNulls is `YES` or only between valid rows if separateNulls is `NO`.- Parameters:
columns- array of columns containing stringssepCol- strings column that provides the separator for a given rowseparatorNarep- string scalar indicating null behavior when a separator is null. If set to null and the separator is null the resulting string will be null. If not null, this string will be used in place of a null separator.colNarep- string that should be used in place of any null strings found in any column.separateNulls- if true, then the separator is included for null rows if `colNarep` is valid.- Returns:
- A new java column vector containing the concatenated strings with separator between.
-
listConcatenateByRow
Concatenate columns of lists horizontally (row by row), combining a corresponding row from each column into a single list row of a new column. NOTICE: Any concatenation involving a null list element will result in a null list.- Parameters:
columns- array of columns containing lists, must be non-empty- Returns:
- A new java column vector containing the concatenated lists.
-
listConcatenateByRow
Concatenate columns of lists horizontally (row by row), combining a corresponding row from each column into a single list row of a new column.- Parameters:
ignoreNull- whether to ignore null list element of input columns: If true, null list will be ignored from concatenation; Otherwise, any concatenation involving a null list element will result in a null listcolumns- array of columns containing lists, must be non-empty- Returns:
- A new java column vector containing the concatenated lists.
-
md5Hash
Create a new vector containing the MD5 hash of each row in the table.- Parameters:
columns- array of columns to hash, must have identical number of rows.- Returns:
- the new ColumnVector of 32 character hex strings representing each row's hash value.
-
sha1Hash
Create a new column containing the Sha1 hash of each row in the table.- Parameters:
columns- columns to hash- Returns:
- the new ColumnVector of 40 character hex strings representing each row's hash value.
-
castTo
Generic method to cast ColumnVector When casting from a Date, Timestamp, or Boolean to a numerical type the underlying numerical representation of the data will be used for the cast. For Strings: Casting strings from/to timestamp isn't supported atm. Please look atColumnView.asTimestamp(DType, String)andColumnView.asStrings(String)for casting string to timestamp when the format is known Float values when converted to String could be different from the expected default behavior in Java e.g. 12.3 => "12.30000019" instead of "12.3" Double.POSITIVE_INFINITY => "Inf" instead of "INFINITY" Double.NEGATIVE_INFINITY => "-Inf" instead of "-INFINITY"- Overrides:
castToin classColumnView- Parameters:
type- type of the resulting ColumnVector- Returns:
- A new vector allocated on the GPU
-
build
Create a new vector.- Parameters:
type- the type of vector to build.rows- maximum number of rows that the vector can hold.init- what will initialize the vector.- Returns:
- the created vector.
-
build
public static ColumnVector build(int rows, long stringBufferSize, Consumer<HostColumnVector.Builder> init) -
boolFromBytes
Create a new vector from the given values. -
fromLists
This method is evolving, unstable and currently test only. Please use with caution and expect it to change in the future. -
fromStructs
public static ColumnVector fromStructs(HostColumnVector.DataType dataType, List<HostColumnVector.StructData> lists) This method is evolving, unstable and currently test only. Please use with caution and expect it to change in the future. -
fromStructs
public static ColumnVector fromStructs(HostColumnVector.DataType dataType, HostColumnVector.StructData... lists) This method is evolving, unstable and currently test only. Please use with caution and expect it to change in the future. -
emptyStructs
This method is evolving, unstable and currently test only. Please use with caution and expect it to change in the future. -
fromBooleans
Create a new vector from the given values. -
fromBytes
Create a new vector from the given values. -
fromUnsignedBytes
Create a new vector from the given values.Java does not have an unsigned byte type, so the values will be treated as if the bits represent an unsigned value.
-
fromShorts
Create a new vector from the given values. -
fromUnsignedShorts
Create a new vector from the given values.Java does not have an unsigned short type, so the values will be treated as if the bits represent an unsigned value.
-
fromInts
Create a new vector from the given values. -
fromUnsignedInts
Create a new vector from the given values.Java does not have an unsigned int type, so the values will be treated as if the bits represent an unsigned value.
-
fromLongs
Create a new vector from the given values. -
fromUnsignedLongs
Create a new vector from the given values.Java does not have an unsigned long type, so the values will be treated as if the bits represent an unsigned value.
-
fromFloats
Create a new vector from the given values. -
fromDoubles
Create a new vector from the given values. -
daysFromInts
Create a new vector from the given values. -
durationSecondsFromLongs
Create a new vector from the given values. -
timestampSecondsFromLongs
Create a new vector from the given values. -
durationDaysFromInts
Create a new vector from the given values. -
durationMilliSecondsFromLongs
Create a new vector from the given values. -
timestampMilliSecondsFromLongs
Create a new vector from the given values. -
durationMicroSecondsFromLongs
Create a new vector from the given values. -
timestampMicroSecondsFromLongs
Create a new vector from the given values. -
durationNanoSecondsFromLongs
Create a new vector from the given values. -
timestampNanoSecondsFromLongs
Create a new vector from the given values. -
decimalFromInts
Create a new decimal vector from unscaled values (int array) and scale. The created vector is of type DType.DECIMAL32, whose max precision is 9. Compared with scale of [[java.math.BigDecimal]], the scale here represents the opposite meaning. -
decimalFromBoxedInts
Create a new decimal vector from boxed unscaled values (Integer array) and scale. The created vector is of type DType.DECIMAL32, whose max precision is 9. Compared with scale of [[java.math.BigDecimal]], the scale here represents the opposite meaning. -
decimalFromLongs
Create a new decimal vector from unscaled values (long array) and scale. The created vector is of type DType.DECIMAL64, whose max precision is 18. Compared with scale of [[java.math.BigDecimal]], the scale here represents the opposite meaning. -
decimalFromBoxedLongs
Create a new decimal vector from boxed unscaled values (Long array) and scale. The created vector is of type DType.DECIMAL64, whose max precision is 18. Compared with scale of [[java.math.BigDecimal]], the scale here represents the opposite meaning. -
decimalFromDoubles
Create a new decimal vector from double floats with specific DecimalType and RoundingMode. All doubles will be rescaled if necessary, according to scale of input DecimalType and RoundingMode. If any overflow occurs in extracting integral part, an IllegalArgumentException will be thrown. This API is inefficient because of slow double -> decimal conversion, so it is mainly for testing. Compared with scale of [[java.math.BigDecimal]], the scale here represents the opposite meaning. -
decimalFromBigInt
Create a new decimal vector from BigIntegers Compared with scale of [[java.math.BigDecimal]], the scale here represents the opposite meaning. -
fromStrings
Create a new string vector from the given values. This API supports inline nulls. This is really intended to be used only for testing as it is slow and memory intensive to translate between java strings and UTF8 strings. -
fromUTF8Strings
Create a new string vector from the given values. This API supports inline nulls. -
fromDecimals
Create a new vector from the given values. This API supports inline nulls, but is much slower than building from primitive array of unscaledValues. Notice: 1. All input BigDecimals should share same scale. 2. The scale will be zero if all input values are null. -
fromBoxedBooleans
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
fromBoxedBytes
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
fromBoxedUnsignedBytes
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.Java does not have an unsigned byte type, so the values will be treated as if the bits represent an unsigned value.
-
fromBoxedShorts
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
fromBoxedUnsignedShorts
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.Java does not have an unsigned short type, so the values will be treated as if the bits represent an unsigned value.
-
fromBoxedInts
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
fromBoxedUnsignedInts
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.Java does not have an unsigned int type, so the values will be treated as if the bits represent an unsigned value.
-
fromBoxedLongs
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
fromBoxedUnsignedLongs
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests.Java does not have an unsigned long type, so the values will be treated as if the bits represent an unsigned value.
-
fromBoxedFloats
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
fromBoxedDoubles
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
timestampDaysFromBoxedInts
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
durationDaysFromBoxedInts
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
durationSecondsFromBoxedLongs
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
timestampSecondsFromBoxedLongs
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
durationMilliSecondsFromBoxedLongs
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
timestampMilliSecondsFromBoxedLongs
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
durationMicroSecondsFromBoxedLongs
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
timestampMicroSecondsFromBoxedLongs
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
durationNanoSecondsFromBoxedLongs
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
timestampNanoSecondsFromBoxedLongs
Create a new vector from the given values. This API supports inline nulls, but is much slower than using a regular array and should really only be used for tests. -
empty
Creates an empty column according to the data type. It will create all the nested columns by iterating all the children in the input type object 'colType'. The performance is not good, so use it carefully. We may want to move this implementation to the native once figuring out a way to pass the nested data type to the native.- Parameters:
colType- the data type of the empty column- Returns:
- an empty ColumnVector with its children. Each children contains zero elements. Users should close the ColumnVector to avoid memory leak.
-