Class ColumnView
- All Implemented Interfaces:
BinaryOperable
,AutoCloseable
- Direct Known Subclasses:
ColumnVector
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enum
Enum to choose behaviour of listIndexOf functions: 1. -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final long
protected final ColumnVector.OffHeapState
protected final long
protected final DType
static final long
protected long
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprotected
Intended to be called from ColumnVector when it is being constructed.ColumnView
(DType type, long rows, Optional<Long> nullCount, BaseDeviceMemoryBuffer dataBuffer, BaseDeviceMemoryBuffer validityBuffer) Create a new column view based off of data already on the device.ColumnView
(DType type, long rows, Optional<Long> nullCount, BaseDeviceMemoryBuffer dataBuffer, BaseDeviceMemoryBuffer validityBuffer, BaseDeviceMemoryBuffer offsetBuffer) Create a new column view based off of data already on the device.ColumnView
(DType type, long rows, Optional<Long> nullCount, BaseDeviceMemoryBuffer validityBuffer, BaseDeviceMemoryBuffer offsetBuffer, ColumnView[] children) Create a new column view based off of data already on the device. -
Method Summary
Modifier and TypeMethodDescriptionfinal ColumnVector
abs()
Calculate the abs, output is the same type as input.final ColumnVector
addCalendricalMonths
(ColumnView months) Add the specified number of months to the timestamp.final ColumnVector
addCalendricalMonths
(Scalar months) Add the specified number of months to the timestamp.all()
Returns a boolean scalar that is true if all of the elements in the column are true or non-zero otherwise false.Deprecated.the only output type supported is BOOL8.any()
Returns a boolean scalar that is true if any of the elements in the column are true or non-zero otherwise false.Returns a scalar is true or 1, depending on the specified type, if any of the elements in the column are true or non-zero otherwise false or 0.final ColumnVector
applyBooleanMask
(ColumnView booleanMaskView) Filters elements in each row of this LIST column using `booleanMaskView` LIST of booleans as a mask.final ColumnVector
approxPercentile
(double[] percentiles) Calculate various percentiles of this ColumnVector, which must contain centroids produced by a t-digest aggregation.final ColumnVector
approxPercentile
(ColumnVector percentiles) Calculate various percentiles of this ColumnVector, which must contain centroids produced by a t-digest aggregation.final ColumnVector
arccos()
Calculate the arccos, output is the same type as input.final ColumnVector
arccosh()
Calculate the hyperbolic arccos, output is the same type as input.final ColumnVector
arcsin()
Calculate the arcsin, output is the same type as input.final ColumnVector
arcsinh()
Calculate the hyperbolic arcsin, output is the same type as input.final ColumnVector
arctan()
Calculate the arctan, output is the same type as input.final ColumnVector
arctanh()
Calculate the hyperbolic arctan, output is the same type as input.final ColumnVector
Cast to list of bytes This method converts the rows provided by the ColumnVector and casts each row to a list of bytes with endinanness reversed.final ColumnVector
asByteList
(boolean config) Cast to list of bytes This method converts the rows provided by the ColumnVector and casts each row to a list of bytes.final ColumnVector
asBytes()
Cast to Byte - ColumnVector This method takes the value provided by the ColumnVector and casts to byte When casting from a Date, Timestamp, or Boolean to a byte type the underlying numerical representation of the data will be used for the cast.final ColumnVector
Cast to Double - ColumnVector This method takes the value provided by the ColumnVector and casts to double When casting from a Date, Timestamp, or Boolean to a double type the underlying numerical representation of the data will be used for the cast.final ColumnVector
asFloats()
Cast to Float - ColumnVector This method takes the value provided by the ColumnVector and casts to float When casting from a Date, Timestamp, or Boolean to a float type the underlying numerical representatio of the data will be used for the cast.final ColumnVector
asInts()
Cast to Int - ColumnVector This method takes the value provided by the ColumnVector and casts to int When casting from a Date, Timestamp, or Boolean to a int type the underlying numerical representation of the data will be used for the cast.final ColumnVector
asLongs()
Cast to Long - ColumnVector This method takes the value provided by the ColumnVector and casts to long When casting from a Date, Timestamp, or Boolean to a long type the underlying numerical representation of the data will be used for the cast.final ColumnVector
asShorts()
Cast to Short - ColumnVector This method takes the value provided by the ColumnVector and casts to short When casting from a Date, Timestamp, or Boolean to a short type the underlying numerical representation of the data will be used for the cast.final ColumnVector
Cast to Strings.final ColumnVector
Method to parse and convert a timestamp column vector to string column vector.final ColumnVector
asTimestamp
(DType timestampType, String format) Parse a string to a timestamp.final ColumnVector
Cast to TIMESTAMP_DAYS - ColumnVector This method takes the value provided by the ColumnVector and casts to TIMESTAMP_DAYSfinal ColumnVector
asTimestampDays
(String format) Cast to TIMESTAMP_DAYS - ColumnVector This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_DAYSfinal ColumnVector
Cast to TIMESTAMP_MICROSECONDS - ColumnVector This method takes the value provided by the ColumnVector and casts to TIMESTAMP_MICROSECONDSfinal ColumnVector
asTimestampMicroseconds
(String format) Cast to TIMESTAMP_MICROSECONDS - ColumnVector This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_MICROSECONDSfinal ColumnVector
Cast to TIMESTAMP_MILLISECONDS - ColumnVector This method takes the value provided by the ColumnVector and casts to TIMESTAMP_MILLISECONDS.final ColumnVector
asTimestampMilliseconds
(String format) Cast to TIMESTAMP_MILLISECONDS - ColumnVector This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_MILLISECONDS.final ColumnVector
Cast to TIMESTAMP_NANOSECONDS - ColumnVector This method takes the value provided by the ColumnVector and casts to TIMESTAMP_NANOSECONDS.final ColumnVector
asTimestampNanoseconds
(String format) Cast to TIMESTAMP_NANOSECONDS - ColumnVector This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_NANOSECONDS.final ColumnVector
Cast to TIMESTAMP_SECONDS - ColumnVector This method takes the value provided by the ColumnVector and casts to TIMESTAMP_SECONDSfinal ColumnVector
asTimestampSeconds
(String format) Cast to TIMESTAMP_SECONDS - ColumnVector This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_SECONDSfinal ColumnVector
Cast to unsigned Byte - ColumnVector This method takes the value provided by the ColumnVector and casts to byte When casting from a Date, Timestamp, or Boolean to a byte type the underlying numerical representation of the data will be used for the cast.final ColumnVector
Cast to unsigned Int - ColumnVector This method takes the value provided by the ColumnVector and casts to int When casting from a Date, Timestamp, or Boolean to a int type the underlying numerical representation of the data will be used for the cast.final ColumnVector
Cast to unsigned Long - ColumnVector This method takes the value provided by the ColumnVector and casts to long When casting from a Date, Timestamp, or Boolean to a long type the underlying numerical representation of the data will be used for the cast.final ColumnVector
Cast to unsigned Short - ColumnVector This method takes the value provided by the ColumnVector and casts to short When casting from a Date, Timestamp, or Boolean to a short type the underlying numerical representation of the data will be used for the cast.final ColumnVector
binaryOp
(BinaryOp op, BinaryOperable rhs, DType outType) Multiple different binary operations.Zero-copy cast between types with the same underlying length.final ColumnVector
bitCount()
Count the number of set bit for each integer value.final ColumnVector
Invert the bits, output is the same type as input.final ColumnVector
capitalize
(Scalar delimiters) Returns a column of capitalized strings.Generic method to cast ColumnVector When casting from a Date, Timestamp, or Boolean to a numerical type the underlying numerical representation of the data will be used for the cast.final ColumnVector
cbrt()
Calculate the cube root, output is the same type as input.final ColumnVector
ceil()
Calculate the ceil, output is the same type as input.final ColumnVector
Replaces values less than `lo` in `input` with `lo`, and values greater than `hi` with `hi`.final ColumnVector
Replaces values less than `lo` in `input` with `lo_replace`, and values greater than `hi` with `hi_replace`.void
close()
final ColumnVector
Get the code point values (integers) for each character of each string.final ColumnVector
contains
(ColumnView searchSpace) Returns a new column ofDType.BOOL8
elements having the same size as this column, each row value is true if the corresponding entry in this column is contained in the given searchSpace column and false if it is not.boolean
Find if the `needle` is present in this col example: Single Column: idx 0 1 2 3 4 col = { 10, 20, 20, 30, 50 } Scalar: value = { 20 } result = truefinal ColumnVector
containsRe
(RegexProgram regexProg) Returns a boolean ColumnVector identifying rows which match the given RegexProgram pattern starting at any location.final ColumnVector
containsRe
(String pattern) Deprecated.Creates a ColumnVector from a column view handleCopy the data to host memory synchronouslycopyToHost
(HostMemoryAllocator hostMemoryAllocator) Copy the data to the host synchronously.copyToHostAsync
(Cuda.Stream stream) Copy the data to the host asynchronously.copyToHostAsync
(Cuda.Stream stream, HostMemoryAllocator hostMemoryAllocator) Copy the data to the host asynchronously.final ColumnVector
cos()
Calculate the cos, output is the same type as input.final ColumnVector
cosh()
Calculate the hyperbolic cos, output is the same type as input.final ColumnVector
Get the number of elements for each list.final ColumnVector
Round the timestamp up to the given frequency keeping the type the same.final ColumnVector
Round the timestamp down to the given frequency keeping the type the same.final ColumnVector
Round the timestamp (half up) to the given frequency keeping the type the same.final ColumnVector
day()
Get day from a timestamp.final ColumnVector
Get the day of the year from a timestamp.final ColumnVector
Extract the number of days in the monthint
Count how many rows in the column are distinct from one another.int
distinctCount
(NullPolicy nullPolicy) Count how many rows in the column are distinct from one another.final ColumnVector
Create a new LIST column by copying elements from the current LIST column ignoring duplicate, producing a LIST column in which each list contain only unique elements.final ColumnVector
dropListDuplicates
(DuplicateKeepOption keepOption) Create a new LIST column by copying elements from the current LIST column ignoring duplicate, producing a LIST column in which each list contain only unique elements.final ColumnVector
Given a LIST column in which each element is a struct containing a <key, value> pair.final ColumnVector
Checks if each string in a column ends with a specified comparison string, resulting in a parallel column of the boolean results.final ColumnVector
exp()
Calculate the exp, output is the same type as input.final ColumnVector
extractAllRecord
(RegexProgram regexProg, int idx) Extracts all strings that match the given regex program pattern and corresponds to the regular expression group index.final ColumnVector
extractAllRecord
(String pattern, int idx) Deprecated.final ColumnVector
extractDateTimeComponent
(DateTimeComponent component) Extract a particular date time component from a timestamp.final ColumnVector
extractListElement
(int index) For each list in this column pull out the entry at the given index.final ColumnVector
extractListElement
(ColumnView indices) For each list in this column pull out the entry at the corresponding index specified in the index column.final Table
extractRe
(RegexProgram regexProg) For each captured group specified in the given regex program return a column in the table.final Table
Deprecated.final ColumnVector
findAndReplaceAll
(ColumnView oldValues, ColumnView newValues) Returns a vector with all values "oldValues[i]" replaced with "newValues[i]".Flatten each list of lists into a single list.flattenLists
(boolean ignoreNull) Flatten each list of lists into a single list.final ColumnVector
floor()
Calculate the floor, output is the same type as input.static ColumnView
fromDeviceBuffer
(BaseDeviceMemoryBuffer buffer, long startOffset, DType type, int rows) Create a new column view from a raw device buffer.final ColumnVector
Generate list offsets from sizes of each list.final ColumnVector
Retrieve the number of bytes for each string.final ColumnVector
Retrieve the number of characters in each string.final ColumnView
getChildColumnView
(int childIndex) Returns the child column view at a given index.final ColumnView[]
Returns the child column views for this view Please note that it is the responsibility of the caller to close these views.final BaseDeviceMemoryBuffer
getData()
Gets the data buffer for the current column view (viewHandle).long
Returns the amount of device memory used.long
Calculate the total space required to copy the data to the host.final ColumnVector
getJSONObject
(Scalar path) Apply a JSONPath string to all rows in an input strings column.final ColumnVector
getJSONObject
(Scalar path, GetJsonObjectOptions options) Apply a JSONPath string to all rows in an input strings column.Get a ColumnView that is the offsets for this list.final ColumnVector
getMapKeyExistence
(ColumnView keys) For a column of type List<Struct<_, _>> and a passed in key column, return a boolean column for all keys in the map.final ColumnVector
getMapKeyExistence
(Scalar key) For a column of type List<Struct<String, String>> and a passed in String key, return a boolean column for all keys in the structs, It is true if the key exists in the corresponding map for that row, false otherwise.final ColumnVector
getMapValue
(ColumnView keys) Given a column of type List<Struct<X, Y>> and a key column of type X, return a column of type Y, where each row in the output column is the Y value corresponding to the X key.final ColumnVector
getMapValue
(Scalar key) Given a column of type List<Struct<X, Y>> and a key of type X, return a column of type Y, where each row in the output column is the Y value corresponding to the X key.final long
USE WITH CAUTION: This method exposes the address of the native cudf::column_view.long
Returns the number of nulls in the data.final int
final BaseDeviceMemoryBuffer
final long
Returns the number of rows in this vector.final Scalar
getScalarElement
(int index) Get a single item from the column at the specified index as a Scalar.final DType
getType()
Get the type of this data.final BaseDeviceMemoryBuffer
getValid()
boolean
Exact check if a column or its descendants have non-empty null rowsstatic long
Get the size that the host will align memory allocations to in bytes.final ColumnVector
hour()
Get hour from a timestamp with time resolution.final ColumnVector
ifElse
(ColumnView trueValues, ColumnView falseValues) For a BOOL8 vector, computes a vector whose rows are selected from two other vectors based on the boolean value of this vector in the corresponding row.final ColumnVector
ifElse
(ColumnView trueValues, Scalar falseValue) For a BOOL8 vector, computes a vector whose rows are selected from two other inputs based on the boolean value of this vector in the corresponding row.final ColumnVector
ifElse
(Scalar trueValue, ColumnView falseValues) For a BOOL8 vector, computes a vector whose rows are selected from two other inputs based on the boolean value of this vector in the corresponding row.final ColumnVector
For a BOOL8 vector, computes a vector whose rows are selected from two other inputs based on the boolean value of this vector in the corresponding row.final ColumnVector
isFixedPoint
(DType decimalType) Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is a fixed-point, and FALSE if its not a fixed-point.final ColumnVector
isFloat()
Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is a float, and FALSE if its not a float.final ColumnVector
Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is an integer, and FALSE if its not an integer.final ColumnVector
Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is an integer, and FALSE if its not an integer.final ColumnVector
Check to see if the year for this timestamp is a leap year or not.final ColumnVector
isNan()
Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is NaN, and FALSE if null or a valid floating point valuefinal ColumnVector
isNotNan()
Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is null or a valid floating point value, FALSE otherwisefinal ColumnVector
Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is not null, and FALSE for any null entry (as per the validity mask)final ColumnVector
isNull()
Returns a Boolean vector with the same number of rows as this instance, that has FALSE for any entry that is not null, and TRUE for any null entry (as per the validity mask)final ColumnVector
isTimestamp
(String format) Verifies that a string column can be parsed to timestamps using the provided format pattern.final ColumnVector
joinStrings
(Scalar separator, Scalar narep) Concatenates all strings in the column into one new string delimited by an optional separator string.final ColumnVector
Get the date that is the last day of the month for this timestamp.final ColumnVector
Returns a boolean ColumnVector identifying rows which match the given like pattern.final ColumnVector
listContains
(Scalar key) Create a column of bool values indicating whether the specified scalar is an element of each row of a list column.final ColumnVector
Create a column of bool values indicating whether the list rows of the first column contain the corresponding values in the second column.final ColumnVector
Create a column of bool values indicating whether the list rows of the specified column contain null elements.final ColumnVector
listIndexOf
(ColumnView keys, ColumnView.FindOptions findOption) Create a column of int32 indices, indicating the position of each row in the search key column in the corresponding row of the lists column.final ColumnVector
listIndexOf
(Scalar key, ColumnView.FindOptions findOption) Create a column of int32 indices, indicating the position of the scalar search key in each list row.listReduce
(SegmentedReductionAggregation aggregation) Do a reduction on the values in a list.listReduce
(SegmentedReductionAggregation aggregation, DType outType) Do a reduction on the values in a list.listReduce
(SegmentedReductionAggregation aggregation, NullPolicy nullPolicy, DType outType) Do a reduction on the values in a list.static ColumnVector
listsDifferenceDistinct
(ColumnView lhs, ColumnView rhs) Find the difference of lists of the left column against lists of the right column.static ColumnVector
listsHaveOverlap
(ColumnView lhs, ColumnView rhs) For each pair of lists from the input lists columns, check if they have any common non-null elements.static ColumnVector
listsIntersectDistinct
(ColumnView lhs, ColumnView rhs) Find the intersection without duplicate between lists at each row of the given lists columns.final ColumnVector
listSortRows
(boolean isDescending, boolean isNullSmallest) Segmented sort of the elements within a list in each row of a list column.static ColumnVector
listsUnionDistinct
(ColumnView lhs, ColumnView rhs) Find the union without duplicate between lists at each row of the given lists columns.final ColumnVector
log()
Calculate the log, output is the same type as input.final ColumnVector
log10()
Calculate the log with base 10, output is the same type as input.final ColumnVector
log2()
Calculate the log with base 2, output is the same type as input.logicalCastTo
(DType type) Deprecated.this has changed to bit_cast in C++ so use that name insteadfinal ColumnVector
lower()
Convert a string to lower case.final ColumnVector
lstrip()
Removes whitespace from the beginning of a string.final ColumnVector
Removes the specified characters from the beginning of each string.static ColumnView
makeStructView
(long rows, ColumnView... columns) Create a new struct column view of existing column views.static ColumnView
makeStructView
(ColumnView... columns) Create a new struct column view of existing column views.final ColumnVector
matchesRe
(RegexProgram regexProg) Returns a boolean ColumnVector identifying rows which match the given regex program pattern but only at the beginning of the string.final ColumnVector
Deprecated.max()
Returns the maximum of all values in the column, returning a scalar of the same type as this column.Deprecated.the max reduction no longer internally allows for setting the output type, as a work around this API will cast the input type to the output type for you, but this may not work in all cases.mean()
Returns the arithmetic mean of all values in the column, returning a FLOAT64 scalar unless the column type is FLOAT32 then a FLOAT32 scalar is returned.Returns the arithmetic mean of all values in the column, returning a scalar of the specified type.final ColumnVector
mergeAndSetValidity
(BinaryOp mergeOp, ColumnView... columns) Create a deep copy of the column while replacing the null mask.min()
Returns the minimum of all values in the column, returning a scalar of the same type as this column.Deprecated.the min reduction no longer internally allows for setting the output type, as a work around this API will cast the input type to the output type for you, but this may not work in all cases.final ColumnVector
minute()
Get minute from a timestamp with time resolution.final ColumnVector
month()
Get month from a timestamp.final ColumnVector
Returns a new ColumnVector with NaNs converted to nulls, preserving the existing null values.final ColumnVector
Create a new vector of "normalized" values, where: 1.final ColumnVector
not()
Returns a vector of the logical `not` of each value in the input column (this)final ColumnVector
pad
(int width) Pad the Strings column until it reaches the desired length with spaces " " on the right.final ColumnVector
Pad the Strings column until it reaches the desired length with spaces " ".final ColumnVector
Pad the Strings column until it reaches the desired length.final ColumnVector
Compute the prefix sum (aka cumulative sum) of the values in this column.product()
Returns the product of all values in the column, returning a scalar of the same type as this column.Returns the product of all values in the column, returning a scalar of the specified type.Copies this column into output while purging any non-empty null rows in the column or its descendants.final ColumnVector
quantile
(QuantileMethod method, double[] quantiles) Calculate various quantiles of this ColumnVector.final ColumnVector
Get the quarter of the year from a timestamp.reduce
(ReductionAggregation aggregation) Computes the reduction of the values in all rows of a column.reduce
(ReductionAggregation aggregation, DType outType) Computes the reduction of the values in all rows of a column.final ColumnVector
repeatStrings
(int repeatTimes) Given a strings column, each string in it is repeated a number of times specified by therepeatTimes
parameter.final ColumnVector
repeatStrings
(ColumnView repeatTimes) Given a strings column, an output strings column is generated by repeating each of the input string by a number of times given by the corresponding row in arepeatTimes
numeric column.replaceChildrenWithViews
(int[] indices, ColumnView[] views) This method takes in a nested type and replaces its children with the given views Note: Make sure the numbers of rows in the leaf node are the same as the child replacing it otherwise the list can point to elements outside of the column values.replaceListChild
(ColumnView child) This method takes in a list and returns a new list with the leaf node replaced with the given view.final ColumnVector
replaceMultiRegex
(String[] patterns, ColumnView repls) For each string, replaces any character sequence matching any of the regular expression patterns with the corresponding replacement strings.final ColumnVector
replaceNulls
(ColumnView replacements) Returns a ColumnVector with any null values replaced with the corresponding row in the specified replacement column.final ColumnVector
replaceNulls
(ReplacePolicy policy) final ColumnVector
replaceNulls
(Scalar scalar) Returns a ColumnVector with any null values replaced with a scalar.final ColumnVector
replaceRegex
(RegexProgram regexProg, Scalar repl) For each string, replaces any character sequence matching the given regex program pattern using the replacement string scalar.final ColumnVector
replaceRegex
(RegexProgram regexProg, Scalar repl, int maxRepl) For each string, replaces any character sequence matching the given regex program pattern using the replacement string scalar.final ColumnVector
replaceRegex
(String pattern, Scalar repl) Deprecated.final ColumnVector
replaceRegex
(String pattern, Scalar repl, int maxRepl) Deprecated.final ColumnVector
Copy the current column to a new column, each string or list of the output column will have reverse order of characters or elements.final ColumnVector
rint()
Rounds a floating-point argument to the closest integer value, but returns it as a float.final ColumnVector
rollingWindow
(RollingAggregation op, WindowOptions options) This function aggregates values in a window around each element i of the input column.round()
Rounds all the values in a column with these default values: decimalPlaces = 0 Rounding method = RoundMode.HALF_UPround
(int decimalPlaces) Rounds all the values in a column to the specified number of decimal places with HALF_UP (default) as Rounding method.Rounds all the values in a column to the specified number of decimal places.Rounds all the values in a column with decimal places = 0.final ColumnVector
rstrip()
Removes whitespace from the end of a string.final ColumnVector
Removes the specified characters from the end of each string.final ColumnVector
scan
(ScanAggregation aggregation) Computes an inclusive scan for a column that excludes nulls.final ColumnVector
scan
(ScanAggregation aggregation, ScanType scanType) Computes a scan for a column that excludes nulls.final ColumnVector
scan
(ScanAggregation aggregation, ScanType scanType, NullPolicy nullPolicy) Computes a scan for a column.final ColumnVector
second()
Get second from a timestamp with time resolution.segmentedGather
(ColumnView gatherMap) Segmented gather of the elements within a list element in each row of a list column.segmentedGather
(ColumnView gatherMap, OutOfBoundsPolicy policy) Segmented gather of the elements within a list element in each row of a list column.segmentedReduce
(ColumnView offsets, SegmentedReductionAggregation aggregation) Do a segmented reduce where the offsets column indicates which groups in this to combine.segmentedReduce
(ColumnView offsets, SegmentedReductionAggregation aggregation, DType outType) Do a segmented reduce where the offsets column indicates which groups in this to combine.segmentedReduce
(ColumnView offsets, SegmentedReductionAggregation aggregation, NullPolicy nullPolicy, DType outType) Do a segmented reduce where the offsets column indicates which groups in this to combine.final ColumnVector
sin()
Calculate the sin, output is the same type as input.final ColumnVector
sinh()
Calculate the hyperbolic sin, output is the same type as input.final ColumnVector[]
slice
(int... indices) Slices a column (including null values) into a set of columns according to a set of indices.final ColumnVector[]
split
(int... indices) Splits a column (including null values) into a set of columns according to a set of indices.splitAsViews
(int... indices) Splits a ColumnView (including null values) into a set of ColumnViews according to a set of indices.final ColumnVector
sqrt()
Calculate the sqrt, output is the same type as input.Returns the sample standard deviation of all values in the column, returning a FLOAT64 scalar unless the column type is FLOAT32 then a FLOAT32 scalar is returned.standardDeviation
(DType outType) Returns the sample standard deviation of all values in the column, returning a scalar of the specified type.final ColumnVector
startsWith
(Scalar pattern) Checks if each string in a column starts with a specified comparison string, resulting in a parallel column of the boolean results.final ColumnVector
Given a lists column of strings (each row is a list of strings), concatenates the strings within each row and returns a single strings column result.final ColumnVector
stringConcatenateListElements
(ColumnView sepCol, Scalar separatorNarep, Scalar stringNarep, boolean separateNulls, boolean emptyStringOutputIfEmptyList) Given a lists column of strings (each row is a list of strings), concatenates the strings within each row and returns a single strings column result.final ColumnVector
stringConcatenateListElements
(Scalar separator, Scalar narep, boolean separateNulls, boolean emptyStringOutputIfEmptyList) Given a lists column of strings (each row is a list of strings), concatenates the strings within each row and returns a single strings column result.final ColumnVector[]
stringContains
(ColumnView targets) final ColumnVector
stringContains
(Scalar compString) Checks if each string in a column contains a specified comparison string, resulting in a parallel column of the boolean results.final ColumnVector
stringLocate
(Scalar substring) Locates the starting index of the first instance of the given string in each row of a column.final ColumnVector
stringLocate
(Scalar substring, int start) Locates the starting index of the first instance of the given string in each row of a column.final ColumnVector
stringLocate
(Scalar substring, int start, int end) Locates the starting index of the first instance of the given string in each row of a column.final ColumnVector
stringReplace
(ColumnView targets, ColumnView repls) Returns a new strings column where target strings with each string are replaced with corresponding replacement strings.final ColumnVector
stringReplace
(Scalar target, Scalar replace) Returns a new strings column where target string within each string is replaced with the specified replacement string.final ColumnVector
stringReplaceWithBackrefs
(RegexProgram regexProg, String replace) For each string, replaces any character sequence matching the given regex program pattern using the replace template for back-references.final ColumnVector
stringReplaceWithBackrefs
(String pattern, String replace) Deprecated.final Table
stringSplit
(RegexProgram regexProg) Returns a list of columns by splitting each string using the specified regex program pattern.final Table
stringSplit
(RegexProgram regexProg, int limit) Returns a list of columns by splitting each string using the specified regex program pattern.final Table
stringSplit
(String delimiter) Returns a list of columns by splitting each string using the specified string literal delimiter.final Table
stringSplit
(String pattern, boolean splitByRegex) Deprecated.final Table
stringSplit
(String delimiter, int limit) Returns a list of columns by splitting each string using the specified string literal delimiter.final Table
stringSplit
(String pattern, int limit, boolean splitByRegex) Deprecated.final ColumnVector
stringSplitRecord
(RegexProgram regexProg) Returns a column that are lists of strings in which each list is made by splitting the corresponding input string using the specified regex program pattern.final ColumnVector
stringSplitRecord
(RegexProgram regexProg, int limit) Returns a column that are lists of strings in which each list is made by splitting the corresponding input string using the specified regex program pattern.final ColumnVector
stringSplitRecord
(String delimiter) Returns a column that are lists of strings in which each list is made by splitting the corresponding input string using the specified string literal delimiter.final ColumnVector
stringSplitRecord
(String pattern, boolean splitByRegex) Deprecated.final ColumnVector
stringSplitRecord
(String delimiter, int limit) Returns a column that are lists of strings in which each list is made by splitting the corresponding input string using the specified string literal delimiter.final ColumnVector
stringSplitRecord
(String pattern, int limit, boolean splitByRegex) Deprecated.final ColumnVector
strip()
Removes whitespace from the beginning and end of a string.final ColumnVector
Removes the specified characters from the beginning and end of each string.final ColumnVector
substring
(int start) Returns a new strings column that contains substrings of the strings in the provided column.final ColumnVector
substring
(int start, int end) Returns a new strings column that contains substrings of the strings in the provided column.final ColumnVector
substring
(ColumnView start, ColumnView end) Returns a new strings column that contains substrings of the strings in the provided column which uses unique ranges for each stringfinal ColumnVector
subVector
(int start) Return a subVector from start inclusive to the end of the vector.final ColumnVector
subVector
(int start, int end) Return a subVector.sum()
Computes the sum of all values in the column, returning a scalar of the same type as this column.Computes the sum of all values in the column, returning a scalar of the specified type.Returns the sum of squares of all values in the column, returning a scalar of the same type as this column.sumOfSquares
(DType outType) Returns the sum of squares of all values in the column, returning a scalar of the specified type.final ColumnVector
tan()
Calculate the tan, output is the same type as input.final ColumnVector
tanh()
Calculate the hyperbolic tan, output is the same type as input.protected static long
title
(long handle) toHex()
Convert this integer column to hexadecimal column and return a new strings column Any null entries will result in corresponding null entries in the output column.toString()
final ColumnVector
toTitle()
Returns a column of strings where, for each string row in the input, the first character after spaces is modified to upper-case, while all the remaining characters in a word are modified to lower-case.final ColumnVector
Transform a vector using a custom function.final ColumnVector
Multiple different unary operations.final ColumnVector
upper()
Convert a string to upper case.final ColumnVector
Converts all character sequences starting with '%' into character code-points interpreting the 2 following characters as hex values to create the code-point.final ColumnVector
Converts mostly non-ascii characters and control characters into UTF-8 hex code-points prefixed with '%'.variance()
Returns the variance of all values in the column, returning a FLOAT64 scalar unless the column type is FLOAT32 then a FLOAT32 scalar is returned.Returns the variance of all values in the column, returning a scalar of the specified type.final ColumnVector
weekDay()
Get the day of the week from a timestamp.final ColumnVector
year()
Get year from a timestamp.final ColumnVector
zfill
(int width) Add '0' as padding to the left of each string.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface ai.rapids.cudf.BinaryOperable
add, add, and, and, arctan2, arctan2, bitAnd, bitAnd, bitOr, bitOr, bitXor, bitXor, div, div, equalTo, equalTo, equalToNullAware, equalToNullAware, floorDiv, floorDiv, greaterOrEqualTo, greaterOrEqualTo, greaterThan, greaterThan, lessOrEqualTo, lessOrEqualTo, lessThan, lessThan, log, log, maxNullAware, maxNullAware, minNullAware, minNullAware, mod, mod, mul, mul, notEqualTo, notEqualTo, notEqualToNullAware, notEqualToNullAware, or, or, pmod, pmod, pow, pow, shiftLeft, shiftLeft, shiftRight, shiftRight, shiftRightUnsigned, shiftRightUnsigned, sub, sub, trueDiv, trueDiv
-
Field Details
-
UNKNOWN_NULL_COUNT
public static final long UNKNOWN_NULL_COUNT- See Also:
-
viewHandle
protected long viewHandle -
type
-
rows
protected final long rows -
nullCount
protected final long nullCount -
offHeap
-
-
Constructor Details
-
ColumnView
Intended to be called from ColumnVector when it is being constructed. Because state creates a cudf::column_view instance and will close it in all cases, we don't want to have to double close it. This asserts that if the offHeapState is of nested-type it doesn't contain non-empty nulls- Parameters:
state
- the state this view is based off of.- Throws:
AssertionError
- if offHeapState points to a nested-type view with non-empty nulls
-
ColumnView
public ColumnView(DType type, long rows, Optional<Long> nullCount, BaseDeviceMemoryBuffer validityBuffer, BaseDeviceMemoryBuffer offsetBuffer, ColumnView[] children) Create a new column view based off of data already on the device. Ref count on the buffers is not incremented and none of the underlying buffers are owned by this view. The returned ColumnView is only valid as long as the underlying buffers remain valid. If the buffers are closed before this ColumnView is closed, it will result in undefined behavior. If ownership is needed, callcopyToColumnVector()
- Parameters:
type
- the type of the vectorrows
- the number of rows in this vector.nullCount
- the number of nulls in the dataset.validityBuffer
- an optional validity buffer. Must be provided if nullCount != 0. The ownership doesn't change on this bufferoffsetBuffer
- a host buffer required for nested types including strings and string categories. The ownership doesn't change on this bufferchildren
- an array of ColumnView children
-
ColumnView
public ColumnView(DType type, long rows, Optional<Long> nullCount, BaseDeviceMemoryBuffer dataBuffer, BaseDeviceMemoryBuffer validityBuffer) Create a new column view based off of data already on the device. Ref count on the buffers is not incremented and none of the underlying buffers are owned by this view. The returned ColumnView is only valid as long as the underlying buffers remain valid. If the buffers are closed before this ColumnView is closed, it will result in undefined behavior. If ownership is needed, callcopyToColumnVector()
- Parameters:
type
- the type of the vectorrows
- the number of rows in this vector.nullCount
- the number of nulls in the dataset.dataBuffer
- a host buffer required for nested types including strings and string categories. The ownership doesn't change on this buffervalidityBuffer
- an optional validity buffer. Must be provided if nullCount != 0. The ownership doesn't change on this buffer
-
ColumnView
public ColumnView(DType type, long rows, Optional<Long> nullCount, BaseDeviceMemoryBuffer dataBuffer, BaseDeviceMemoryBuffer validityBuffer, BaseDeviceMemoryBuffer offsetBuffer) Create a new column view based off of data already on the device. Ref count on the buffers is not incremented and none of the underlying buffers are owned by this view. The returned ColumnView is only valid as long as the underlying buffers remain valid. If the buffers are closed before this ColumnView is closed, it will result in undefined behavior. If ownership is needed, callcopyToColumnVector()
- Parameters:
type
- the type of the vectorrows
- the number of rows in this vector.nullCount
- the number of nulls in the dataset.dataBuffer
- a host buffer required for nested types including strings and string categories. The ownership doesn't change on this buffervalidityBuffer
- an optional validity buffer. Must be provided if nullCount != 0. The ownership doesn't change on this bufferoffsetBuffer
- The offsetbuffer for columns that need an offset buffer
-
-
Method Details
-
copyToColumnVector
Creates a ColumnVector from a column view handle- Returns:
- a new ColumnVector
-
getNativeView
public final long getNativeView()USE WITH CAUTION: This method exposes the address of the native cudf::column_view. This allows writing custom kernels or other cuda operations on the data. DO NOT close this column vector until you are completely done using the native column_view. DO NOT modify the column in any way. This should be treated as a read only data structure. This API is unstable as the underlying C/C++ API is still not stabilized. If the underlying data structure is renamed this API may be replaced. The underlying data structure can change from release to release (it is not stable yet) so be sure that your native code is complied against the exact same version of libcudf as this is released for. -
getType
Description copied from interface:BinaryOperable
Get the type of this data.- Specified by:
getType
in interfaceBinaryOperable
-
getChildColumnViews
Returns the child column views for this view Please note that it is the responsibility of the caller to close these views.- Returns:
- an array of child column views
-
getChildColumnView
Returns the child column view at a given index. Please note that it is the responsibility of the caller to close this view.- Parameters:
childIndex
- the index of the child- Returns:
- a column view
-
getListOffsetsView
Get a ColumnView that is the offsets for this list. Please note that it is the responsibility of the caller to close this view, and the parent column must out live this view. -
getData
Gets the data buffer for the current column view (viewHandle). If the type is LIST, STRUCT it returns null.- Returns:
- If the type is LIST, STRUCT or data buffer is empty it returns null, else return the data device buffer
-
getOffsets
-
getValid
-
getNullCount
public long getNullCount()Returns the number of nulls in the data. Note that this might end up being a very expensive operation because if the null count is not known it will be calculated. -
getRowCount
public final long getRowCount()Returns the number of rows in this vector. -
getNumChildren
public final int getNumChildren() -
getDeviceMemorySize
public long getDeviceMemorySize()Returns the amount of device memory used. -
close
public void close()- Specified by:
close
in interfaceAutoCloseable
-
toString
-
nansToNulls
Returns a new ColumnVector with NaNs converted to nulls, preserving the existing null values. -
getCharLengths
Retrieve the number of characters in each string. Null strings will have value of null.- Returns:
- ColumnVector holding length of string at index 'i' in the original vector
-
getByteCount
Retrieve the number of bytes for each string. Null strings will have value of null.- Returns:
- ColumnVector, where each element at i = byte count of string at index 'i' in the original vector
-
codePoints
Get the code point values (integers) for each character of each string.- Returns:
- ColumnVector, with code point integer values for each character as INT32
-
countElements
Get the number of elements for each list. Null lists will have a value of null.- Returns:
- the number of elements in each list as an INT32 value.
-
isNotNull
Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is not null, and FALSE for any null entry (as per the validity mask)- Returns:
- - Boolean vector
-
isNull
Returns a Boolean vector with the same number of rows as this instance, that has FALSE for any entry that is not null, and TRUE for any null entry (as per the validity mask)- Returns:
- - Boolean vector
-
isFixedPoint
Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is a fixed-point, and FALSE if its not a fixed-point. A null will be returned for null entries. The sign and the exponent is optional. The decimal point may only appear once. The integer component must fit within the size limits of the underlying fixed-point storage type. The value of the integer component is based on the scale of the target decimalType. Example: vec = ["A", "nan", "Inf", "-Inf", "Infinity", "infinity", "2.1474", "112.383", "-2.14748", "NULL", "null", null, "1.2", "1.2e-4", "0.00012"] vec.isFixedPoint() = [false, false, false, false, false, false, true, true, true, false, false, null, true, true, true]- Parameters:
decimalType
- the data type that should be used for bounds checking. Note that only Decimal types (fixed-point) are allowed.- Returns:
- Boolean vector
-
isInteger
Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is an integer, and FALSE if its not an integer. A null will be returned for null entries. NOTE: Integer doesn't mean a 32-bit integer. It means a number that is not a fraction. i.e. If this method returns true for a value it could still result in an overflow or underflow if you convert it to a Java integral type- Returns:
- Boolean vector
-
isInteger
Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is an integer, and FALSE if its not an integer. A null will be returned for null entries.- Parameters:
intType
- the data type that should be used for bounds checking. Note that only cudf integer types are allowed including signed/unsigned int8 through int64- Returns:
- Boolean vector
-
isFloat
Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is a float, and FALSE if its not a float. A null will be returned for null entries NOTE: Float doesn't mean a 32-bit float. It means a number that is a fraction or can be written as a fraction. i.e. This method will return true for integers as well as floats. Also note if this method returns true for a value it could still result in an overflow or underflow if you convert it to a Java float or double- Returns:
- - Boolean vector
-
isNan
Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is NaN, and FALSE if null or a valid floating point value- Returns:
- - Boolean vector
-
isNotNan
Returns a Boolean vector with the same number of rows as this instance, that has TRUE for any entry that is null or a valid floating point value, FALSE otherwise- Returns:
- - Boolean vector
-
findAndReplaceAll
Returns a vector with all values "oldValues[i]" replaced with "newValues[i]". Warning: Currently this function doesn't work for Strings or StringCategories. NaNs can't be replaced in the original vector but regular values can be replaced with NaNs Nulls can't be replaced in the original vector but regular values can be replaced with Nulls Mixing of types isn't allowed, the resulting vector will be the same type as the original. e.g. You can't replace an integer vector with values from a long vector Usage: this = {1, 4, 5, 1, 5} oldValues = {1, 5, 7} newValues = {2, 6, 9} result = this.findAndReplaceAll(oldValues, newValues); result = {2, 4, 6, 2, 6} (1 and 5 replaced with 2 and 6 but 7 wasn't found so no change)- Parameters:
oldValues
- - A vector containing values that should be replacednewValues
- - A vector containing new values- Returns:
- - A new vector containing the old values replaced with new values
-
replaceNulls
Returns a ColumnVector with any null values replaced with a scalar. The types of the input ColumnVector and Scalar must match, else an error is thrown.- Parameters:
scalar
- - Scalar value to use as replacement- Returns:
- - ColumnVector with nulls replaced by scalar
-
replaceNulls
Returns a ColumnVector with any null values replaced with the corresponding row in the specified replacement column. This column and the replacement column must have the same type and number of rows.- Parameters:
replacements
- column of replacement values- Returns:
- column with nulls replaced by corresponding row of replacements column
-
replaceNulls
-
ifElse
For a BOOL8 vector, computes a vector whose rows are selected from two other vectors based on the boolean value of this vector in the corresponding row. If the boolean value in a row is true, the corresponding row is selected from trueValues otherwise the corresponding row from falseValues is selected. Note that trueValues and falseValues vectors must be the same length as this vector, and trueValues and falseValues must have the same data type.- Parameters:
trueValues
- the values to select if a row in this column is truefalseValues
- the values to select if a row in this column is not true- Returns:
- the computed vector
-
ifElse
For a BOOL8 vector, computes a vector whose rows are selected from two other inputs based on the boolean value of this vector in the corresponding row. If the boolean value in a row is true, the corresponding row is selected from trueValues otherwise the value from falseValue is selected. Note that trueValues must be the same length as this vector, and trueValues and falseValue must have the same data type. Note that the trueValues vector and falseValue scalar must have the same data type.- Parameters:
trueValues
- the values to select if a row in this column is truefalseValue
- the value to select if a row in this column is not true- Returns:
- the computed vector
-
ifElse
For a BOOL8 vector, computes a vector whose rows are selected from two other inputs based on the boolean value of this vector in the corresponding row. If the boolean value in a row is true, the value from trueValue is selected otherwise the corresponding row from falseValues is selected. Note that falseValues must be the same length as this vector, and trueValue and falseValues must have the same data type. Note that the trueValue scalar and falseValues vector must have the same data type.- Parameters:
trueValue
- the value to select if a row in this column is truefalseValues
- the values to select if a row in this column is not true- Returns:
- the computed vector
-
ifElse
For a BOOL8 vector, computes a vector whose rows are selected from two other inputs based on the boolean value of this vector in the corresponding row. If the boolean value in a row is true, the value from trueValue is selected otherwise the value from falseValue is selected. Note that the trueValue and falseValue scalars must have the same data type.- Parameters:
trueValue
- the value to select if a row in this column is truefalseValue
- the value to select if a row in this column is not true- Returns:
- the computed vector
-
slice
Slices a column (including null values) into a set of columns according to a set of indices. The caller owns the ColumnVectors and is responsible closing them The "slice" function divides part of the input column into multiple intervals of rows using the indices values and it stores the intervals into the output columns. Regarding the interval of indices, a pair of values are taken from the indices array in a consecutive manner. The pair of indices are left-closed and right-open. The pairs of indices in the array are required to comply with the following conditions: a, b belongs to Range[0, input column size] a <= b, where the position of a is less or equal to the position of b. Exceptional cases for the indices array are: When the values in the pair are equal, the function returns an empty column. When the values in the pair are 'strictly decreasing', the outcome is undefined. When any of the values in the pair don't belong to the range[0, input column size), the outcome is undefined. When the indices array is empty, an empty vector of columns is returned. The caller owns the output ColumnVectors and is responsible for closing them.- Parameters:
indices
-- Returns:
- A new ColumnVector array with slices from the original ColumnVector
-
subVector
Return a subVector from start inclusive to the end of the vector.- Parameters:
start
- the index to start at.
-
subVector
Return a subVector.- Parameters:
start
- the index to start at (inclusive).end
- the index to end at (exclusive).
-
split
Splits a column (including null values) into a set of columns according to a set of indices. The caller owns the ColumnVectors and is responsible closing them. The "split" function divides the input column into multiple intervals of rows using the splits indices values and it stores the intervals into the output columns. Regarding the interval of indices, a pair of values are taken from the indices array in a consecutive manner. The pair of indices are left-closed and right-open. The indices array ('splits') is require to be a monotonic non-decreasing set. The indices in the array are required to comply with the following conditions: a, b belongs to Range[0, input column size] a <= b, where the position of a is less or equal to the position of b. The split function will take a pair of indices from the indices array ('splits') in a consecutive manner. For the first pair, the function will take the value 0 and the first element of the indices array. For the last pair, the function will take the last element of the indices array and the size of the input column. Exceptional cases for the indices array are: When the values in the pair are equal, the function return an empty column. When the values in the pair are 'strictly decreasing', the outcome is undefined. When any of the values in the pair don't belong to the range[0, input column size), the outcome is undefined. When the indices array is empty, an empty vector of columns is returned. The input columns may have different sizes. The number of columns must be equal to the number of indices in the array plus one. Example: input: {10, 12, 14, 16, 18, 20, 22, 24, 26, 28} splits: {2, 5, 9} output: {{10, 12}, {14, 16, 18}, {20, 22, 24, 26}, {28}} Note that this is very similar to the output from a PartitionedTable.- Parameters:
indices
- the indexes to split with- Returns:
- A new ColumnVector array with slices from the original ColumnVector
-
splitAsViews
Splits a ColumnView (including null values) into a set of ColumnViews according to a set of indices. No data is moved or copied. IMPORTANT NOTE: Nothing is copied out from the vector and the slices will only be relevant for the lifecycle of the underlying ColumnVector. The "split" function divides the input column into multiple intervals of rows using the splits indices values and it stores the intervals into the output columns. Regarding the interval of indices, a pair of values are taken from the indices array in a consecutive manner. The pair of indices are left-closed and right-open. The indices array ('splits') is required to be a monotonic non-decreasing set. The indices in the array are required to comply with the following conditions: a, b belongs to Range[0, input column size] a <= b, where the position of 'a' is less or equal to the position of 'b'. The split function will take a pair of indices from the indices array ('splits') in a consecutive manner. For the first pair, the function will take the value 0 and the first element of the indices array. For the last pair, the function will take the last element of the indices array and the size of the input column. Exceptional cases for the indices array are: When the values in the pair are equal, the function return an empty column. When the values in the pair are 'strictly decreasing', the outcome is undefined. When any of the values in the pair don't belong to the range[0, input column size), the outcome is undefined. When the indices array is empty, an empty array of ColumnViews is returned. The output columns may have different sizes. The number of columns must be equal to the number of indices in the array plus one. Example: input: {10, 12, 14, 16, 18, 20, 22, 24, 26, 28} splits: {2, 5, 9} output: {{10, 12}, {14, 16, 18}, {20, 22, 24, 26}, {28}} Note that this is very similar to the output from a PartitionedTable.- Parameters:
indices
- the indices to split with- Returns:
- A new ColumnView array with slices from the original ColumnView
-
normalizeNANsAndZeros
Create a new vector of "normalized" values, where: 1. All representations of NaN (and -NaN) are replaced with the normalized NaN value 2. All elements equivalent to 0.0 (including +0.0 and -0.0) are replaced with +0.0. 3. All elements that are not equivalent to NaN or 0.0 remain unchanged. The documentation forDouble.longBitsToDouble(long)
describes how equivalent values of NaN/-NaN might have different bitwise representations. This method may be used to compare different bitwise values of 0.0 or NaN as logically equivalent. For instance, if these values appear in a groupby key column, without normalization 0.0 and -0.0 would be erroneously treated as distinct groups, as will each representation of NaN.- Returns:
- A new ColumnVector with all elements equivalent to NaN/0.0 replaced with a normalized equivalent.
-
mergeAndSetValidity
Create a deep copy of the column while replacing the null mask. The resultant null mask is the bitwise merge of null masks in the columns given as arguments. The result will be sanitized to not contain any non-empty nulls in case of nested types- Parameters:
mergeOp
- binary operator (BITWISE_AND and BITWISE_OR only)columns
- array of columns whose null masks are merged, must have identical number of rows.- Returns:
- the new ColumnVector with merged null mask.
-
extractDateTimeComponent
Extract a particular date time component from a timestamp.- Parameters:
component
- what should be extracted- Returns:
- a column with the extracted information in it.
-
year
Get year from a timestamp.Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
- Returns:
- - A new INT16 vector allocated on the GPU.
-
month
Get month from a timestamp.Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
- Returns:
- - A new INT16 vector allocated on the GPU.
-
day
Get day from a timestamp.Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
- Returns:
- - A new INT16 vector allocated on the GPU.
-
hour
Get hour from a timestamp with time resolution.Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
- Returns:
- - A new INT16 vector allocated on the GPU.
-
minute
Get minute from a timestamp with time resolution.Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
- Returns:
- - A new INT16 vector allocated on the GPU.
-
second
Get second from a timestamp with time resolution.Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
- Returns:
- A new INT16 vector allocated on the GPU.
-
weekDay
Get the day of the week from a timestamp.Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
- Returns:
- A new INT16 vector allocated on the GPU. Monday=1, ..., Sunday=7
-
lastDayOfMonth
Get the date that is the last day of the month for this timestamp.Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
- Returns:
- A new TIMESTAMP_DAYS vector allocated on the GPU.
-
dayOfYear
Get the day of the year from a timestamp.Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
- Returns:
- A new INT16 vector allocated on the GPU. The value is between [1, {365-366}]
-
quarterOfYear
Get the quarter of the year from a timestamp.- Returns:
- A new INT16 vector allocated on the GPU. It will be a value from {1, 2, 3, 4} corresponding to the quarter of the year.
-
addCalendricalMonths
Add the specified number of months to the timestamp.- Parameters:
months
- must be a INT16 column indicating the number of months to add. A negative number of months works too.- Returns:
- the updated timestamp
-
addCalendricalMonths
Add the specified number of months to the timestamp.- Parameters:
months
- must be a INT16 scalar indicating the number of months to add. A negative number of months works too.- Returns:
- the updated timestamp
-
isLeapYear
Check to see if the year for this timestamp is a leap year or not.- Returns:
- BOOL8 vector of results
-
daysInMonth
Extract the number of days in the month- Returns:
- INT16 column of the number of days in the corresponding month
-
dateTimeCeil
Round the timestamp up to the given frequency keeping the type the same.- Parameters:
freq
- what part of the timestamp to round.- Returns:
- a timestamp with the same type, but rounded up.
-
dateTimeFloor
Round the timestamp down to the given frequency keeping the type the same.- Parameters:
freq
- what part of the timestamp to round.- Returns:
- a timestamp with the same type, but rounded down.
-
dateTimeRound
Round the timestamp (half up) to the given frequency keeping the type the same.- Parameters:
freq
- what part of the timestamp to round.- Returns:
- a timestamp with the same type, but rounded (half up).
-
round
Rounds all the values in a column to the specified number of decimal places.- Parameters:
decimalPlaces
- Number of decimal places to round to. If negative, this specifies the number of positions to the left of the decimal point.mode
- Rounding method(either HALF_UP or HALF_EVEN)- Returns:
- a new ColumnVector with rounded values.
-
round
Rounds all the values in a column with decimal places = 0. Default number of decimal places to round to is 0.- Parameters:
round
- Rounding method(either HALF_UP or HALF_EVEN)- Returns:
- a new ColumnVector with rounded values.
-
round
Rounds all the values in a column to the specified number of decimal places with HALF_UP (default) as Rounding method.- Parameters:
decimalPlaces
- Number of decimal places to round to. If negative, this specifies the number of positions to the left of the decimal point.- Returns:
- a new ColumnVector with rounded values.
-
round
Rounds all the values in a column with these default values: decimalPlaces = 0 Rounding method = RoundMode.HALF_UP- Returns:
- a new ColumnVector with rounded values.
-
transform
Transform a vector using a custom function. Be careful this is not simple to do. You need to be positive you know what type of data you are processing and how the data is laid out. This also only works on fixed length types.- Parameters:
udf
- This function will be applied to every element in the vectorisPtx
- is the code of the function ptx? true or C/C++ false.
-
unaryOp
Multiple different unary operations. The output is the same type as input.- Parameters:
op
- the operation to perform- Returns:
- the result
-
sin
Calculate the sin, output is the same type as input. -
cos
Calculate the cos, output is the same type as input. -
tan
Calculate the tan, output is the same type as input. -
arcsin
Calculate the arcsin, output is the same type as input. -
arccos
Calculate the arccos, output is the same type as input. -
arctan
Calculate the arctan, output is the same type as input. -
sinh
Calculate the hyperbolic sin, output is the same type as input. -
cosh
Calculate the hyperbolic cos, output is the same type as input. -
tanh
Calculate the hyperbolic tan, output is the same type as input. -
arcsinh
Calculate the hyperbolic arcsin, output is the same type as input. -
arccosh
Calculate the hyperbolic arccos, output is the same type as input. -
arctanh
Calculate the hyperbolic arctan, output is the same type as input. -
exp
Calculate the exp, output is the same type as input. -
log
Calculate the log, output is the same type as input. -
log2
Calculate the log with base 2, output is the same type as input. -
log10
Calculate the log with base 10, output is the same type as input. -
sqrt
Calculate the sqrt, output is the same type as input. -
cbrt
Calculate the cube root, output is the same type as input. -
ceil
Calculate the ceil, output is the same type as input. -
floor
Calculate the floor, output is the same type as input. -
abs
Calculate the abs, output is the same type as input. -
rint
Rounds a floating-point argument to the closest integer value, but returns it as a float. -
bitCount
Count the number of set bit for each integer value. -
bitInvert
Invert the bits, output is the same type as input. For BOOL8 type, this is equivalent to logical not (UnaryOp.NOT), but this does not matter since Spark does not support bitwise inverting on boolean type. -
binaryOp
Multiple different binary operations.- Specified by:
binaryOp
in interfaceBinaryOperable
- Parameters:
op
- the operation to performrhs
- the rhs of the operationoutType
- the type of output you want.- Returns:
- the result
-
sum
Computes the sum of all values in the column, returning a scalar of the same type as this column. -
sum
Computes the sum of all values in the column, returning a scalar of the specified type. -
min
Returns the minimum of all values in the column, returning a scalar of the same type as this column. -
min
Deprecated.the min reduction no longer internally allows for setting the output type, as a work around this API will cast the input type to the output type for you, but this may not work in all cases.Returns the minimum of all values in the column, returning a scalar of the specified type. -
max
Returns the maximum of all values in the column, returning a scalar of the same type as this column. -
max
Deprecated.the max reduction no longer internally allows for setting the output type, as a work around this API will cast the input type to the output type for you, but this may not work in all cases.Returns the maximum of all values in the column, returning a scalar of the specified type. -
product
Returns the product of all values in the column, returning a scalar of the same type as this column. -
product
Returns the product of all values in the column, returning a scalar of the specified type. -
sumOfSquares
Returns the sum of squares of all values in the column, returning a scalar of the same type as this column. -
sumOfSquares
Returns the sum of squares of all values in the column, returning a scalar of the specified type. -
mean
Returns the arithmetic mean of all values in the column, returning a FLOAT64 scalar unless the column type is FLOAT32 then a FLOAT32 scalar is returned. Null values are skipped. -
mean
Returns the arithmetic mean of all values in the column, returning a scalar of the specified type. Null values are skipped.- Parameters:
outType
- the output type to return. Note that only floating point types are currently supported.
-
variance
Returns the variance of all values in the column, returning a FLOAT64 scalar unless the column type is FLOAT32 then a FLOAT32 scalar is returned. Null values are skipped. -
variance
Returns the variance of all values in the column, returning a scalar of the specified type. Null values are skipped.- Parameters:
outType
- the output type to return. Note that only floating point types are currently supported.
-
standardDeviation
Returns the sample standard deviation of all values in the column, returning a FLOAT64 scalar unless the column type is FLOAT32 then a FLOAT32 scalar is returned. Nulls are not counted as an element of the column when calculating the standard deviation. -
standardDeviation
Returns the sample standard deviation of all values in the column, returning a scalar of the specified type. Null's are not counted as an element of the column when calculating the standard deviation.- Parameters:
outType
- the output type to return. Note that only floating point types are currently supported.
-
any
Returns a boolean scalar that is true if any of the elements in the column are true or non-zero otherwise false. Null values are skipped. -
any
Returns a scalar is true or 1, depending on the specified type, if any of the elements in the column are true or non-zero otherwise false or 0. Null values are skipped. -
all
Returns a boolean scalar that is true if all of the elements in the column are true or non-zero otherwise false. Null values are skipped. -
all
Deprecated.the only output type supported is BOOL8.Returns a scalar is true or 1, depending on the specified type, if all of the elements in the column are true or non-zero otherwise false or 0. Null values are skipped. -
reduce
Computes the reduction of the values in all rows of a column. Overflows in reductions are not detected. Specifying a higher precision output type may prevent overflow. Only the MIN and MAX ops are The null values are skipped for the operation.- Parameters:
aggregation
- The reduction aggregation to perform- Returns:
- The scalar result of the reduction operation. If the column is
empty or the reduction operation fails then the
Scalar.isValid()
method of the result will return false.
-
reduce
Computes the reduction of the values in all rows of a column. Overflows in reductions are not detected. Specifying a higher precision output type may prevent overflow. Only the MIN and MAX ops are supported for reduction of non-arithmetic types (TIMESTAMP...) The null values are skipped for the operation.- Parameters:
aggregation
- The reduction aggregation to performoutType
- The type of scalar value to return. Not all output types are supported by all aggregation operations.- Returns:
- The scalar result of the reduction operation. If the column is
empty or the reduction operation fails then the
Scalar.isValid()
method of the result will return false.
-
segmentedReduce
Do a segmented reduce where the offsets column indicates which groups in this to combine. The output type is the same as the input type.- Parameters:
offsets
- an INT32 column with no nulls.aggregation
- the aggregation to do- Returns:
- the result.
-
segmentedReduce
public ColumnVector segmentedReduce(ColumnView offsets, SegmentedReductionAggregation aggregation, DType outType) Do a segmented reduce where the offsets column indicates which groups in this to combine.- Parameters:
offsets
- an INT32 column with no nulls.aggregation
- the aggregation to dooutType
- the output data type.- Returns:
- the result.
-
segmentedReduce
public ColumnVector segmentedReduce(ColumnView offsets, SegmentedReductionAggregation aggregation, NullPolicy nullPolicy, DType outType) Do a segmented reduce where the offsets column indicates which groups in this to combine.- Parameters:
offsets
- an INT32 column with no nulls.aggregation
- the aggregation to donullPolicy
- the null policy.outType
- the output data type.- Returns:
- the result.
-
segmentedGather
Segmented gather of the elements within a list element in each row of a list column. For each list, assuming the size is N, valid indices of gather map ranges in [-N, N). Out of bound indices refer to null.- Parameters:
gatherMap
- ListColumnView carrying lists of integral indices which maps the element in list of each row in the source columns to rows of lists in the result columns.- Returns:
- the result.
-
segmentedGather
Segmented gather of the elements within a list element in each row of a list column.- Parameters:
gatherMap
- ListColumnView carrying lists of integral indices which maps the element in list of each row in the source columns to rows of lists in the result columns.policy
- OutOfBoundsPolicy, `DONT_CHECK` leads to undefined behaviour; `NULLIFY` replaces out of bounds with null.- Returns:
- the result.
-
listReduce
Do a reduction on the values in a list. The output type will be the type of the data column of this list.- Parameters:
aggregation
- the aggregation to perform
-
listReduce
Do a reduction on the values in a list.- Parameters:
aggregation
- the aggregation to performoutType
- the type of the output. Typically, this should match with the child type of the list.
-
listReduce
public ColumnVector listReduce(SegmentedReductionAggregation aggregation, NullPolicy nullPolicy, DType outType) Do a reduction on the values in a list.- Parameters:
aggregation
- the aggregation to performnullPolicy
- should nulls be included or excluded from the aggregation.outType
- the type of the output. Typically, this should match with the child type of the list.
-
approxPercentile
Calculate various percentiles of this ColumnVector, which must contain centroids produced by a t-digest aggregation.- Parameters:
percentiles
- Required percentiles [0,1]- Returns:
- Column containing the approximate percentile values as a list of doubles, in the same order as the input percentiles
-
approxPercentile
Calculate various percentiles of this ColumnVector, which must contain centroids produced by a t-digest aggregation.- Parameters:
percentiles
- Column containing percentiles [0,1]- Returns:
- Column containing the approximate percentile values as a list of doubles, in the same order as the input percentiles
-
quantile
Calculate various quantiles of this ColumnVector. It is assumed that this is already sorted in the desired order.- Parameters:
method
- the method used to calculate the quantilesquantiles
- the quantile values [0,1]- Returns:
- Column containing the approximate percentile values as a list of doubles, in the same order as the input percentiles
-
rollingWindow
This function aggregates values in a window around each element i of the input column. Please refer to WindowsOptions for various options that can be passed. Note: Only rows-based windows are supported.- Parameters:
op
- the operation to perform.options
- various window function arguments.- Returns:
- Column containing aggregate function result.
- Throws:
IllegalArgumentException
- if unsupported window specification * (i.e. other thanWindowOptions.FrameType.ROWS
is used.
-
prefixSum
Compute the prefix sum (aka cumulative sum) of the values in this column. This is just a convenience method for an inclusive scan with a SUM aggregation. -
scan
public final ColumnVector scan(ScanAggregation aggregation, ScanType scanType, NullPolicy nullPolicy) Computes a scan for a column. This is very similar to a running window on the column.- Parameters:
aggregation
- the aggregation to performscanType
- should the scan be inclusive, include the current row, or exclusive.nullPolicy
- how should nulls be treated. Note that some aggregations also include a null policy too. Currently none of those aggregations are supported so it is undefined how they would interact with each other.
-
scan
Computes a scan for a column that excludes nulls.- Parameters:
aggregation
- the aggregation to performscanType
- should the scan be inclusive, include the current row, or exclusive.
-
scan
Computes an inclusive scan for a column that excludes nulls.- Parameters:
aggregation
- the aggregation to perform
-
not
Returns a vector of the logical `not` of each value in the input column (this) -
contains
Find if the `needle` is present in this col example: Single Column: idx 0 1 2 3 4 col = { 10, 20, 20, 30, 50 } Scalar: value = { 20 } result = true- Parameters:
needle
-- Returns:
- true if needle is present else false
-
contains
Returns a new column ofDType.BOOL8
elements having the same size as this column, each row value is true if the corresponding entry in this column is contained in the given searchSpace column and false if it is not. The caller will be responsible for the lifecycle of the new vector. example: col = { 10, 20, 30, 40, 50 } searchSpace = { 20, 40, 60, 80 } result = { false, true, false, true, false }- Parameters:
searchSpace
-- Returns:
- A new ColumnVector of type
DType.BOOL8
-
toTitle
Returns a column of strings where, for each string row in the input, the first character after spaces is modified to upper-case, while all the remaining characters in a word are modified to lower-case. Any null string entries return corresponding null output column entries -
capitalize
Returns a column of capitalized strings. If the `delimiters` is an empty string, then only the first character of each row is capitalized. Otherwise, a non-delimiter character is capitalized after any delimiter character is found. Example: input = ["tesT1", "a Test", "Another Test", "a\tb"]; delimiters = "" output is ["Test1", "A test", "Another test", "A\tb"] delimiters = " " output is ["Test1", "A Test", "Another Test", "A\tb"] Any null string entries return corresponding null output column entries.- Parameters:
delimiters
- Used if identifying words to capitalize. Should not be null.- Returns:
- a column of capitalized strings. Users should close the returned column.
-
joinStrings
Concatenates all strings in the column into one new string delimited by an optional separator string. This returns a column with one string. Any null entries are ignored unless the narep parameter specifies a replacement string (not a null value).- Parameters:
separator
- what to insert to separate each row.narep
- what to replace nulls with- Returns:
- a ColumnVector with a single string in it.
-
castTo
Generic method to cast ColumnVector When casting from a Date, Timestamp, or Boolean to a numerical type the underlying numerical representation of the data will be used for the cast. For Strings: Casting strings from/to timestamp isn't supported atm. Please look atasTimestamp(DType, String)
andasStrings(String)
for casting string to timestamp when the format is known Float values when converted to String could be different from the expected default behavior in Java e.g. 12.3 => "12.30000019" instead of "12.3" Double.POSITIVE_INFINITY => "Inf" instead of "INFINITY" Double.NEGATIVE_INFINITY => "-Inf" instead of "-INFINITY"- Parameters:
type
- type of the resulting ColumnVector- Returns:
- A new vector allocated on the GPU
-
replaceChildrenWithViews
This method takes in a nested type and replaces its children with the given views Note: Make sure the numbers of rows in the leaf node are the same as the child replacing it otherwise the list can point to elements outside of the column values. Note: this method returns a ColumnView that won't live past the ColumnVector that it's pointing to. Ex: Listlist = col{{1,3}, {9,3,5}} validNewChild = col{8, 3, 9, 2, 0} list.replaceChildrenWithViews(1, validNewChild) => col{{8, 3}, {9, 2, 0}} invalidNewChild = col{3, 2} list.replaceChildrenWithViews(1, invalidNewChild) => col{{3, 2}, {invalid, invalid, invalid}} invalidNewChild = col{8, 3, 9, 2, 0, 0, 7} list.replaceChildrenWithViews(1, invalidNewChild) => col{{8, 3}, {9, 2, 0}} // undefined result -
replaceListChild
This method takes in a list and returns a new list with the leaf node replaced with the given view. Make sure the numbers of rows in the leaf node are the same as the child replacing it otherwise the list can point to elements outside of the column values. Note: this method returns a ColumnView that won't live past the ColumnVector that it's pointing to. Ex: Listlist = col{{1,3}, {9,3,5}} validNewChild = col{8, 3, 9, 2, 0} list.replaceChildrenWithViews(1, validNewChild) => col{{8, 3}, {9, 2, 0}} invalidNewChild = col{3, 2} list.replaceChildrenWithViews(1, invalidNewChild) => col{{3, 2}, {invalid, invalid, invalid}} throws an exception invalidNewChild = col{8, 3, 9, 2, 0, 0, 7} list.replaceChildrenWithViews(1, invalidNewChild) => col{{8, 3}, {9, 2, 0}} throws an exception -
logicalCastTo
Deprecated.this has changed to bit_cast in C++ so use that name insteadZero-copy cast between types with the same underlying representation. Similar to reinterpret_cast or bit_cast in C++. This will essentially take the underlying data and update the metadata to reflect a new type. Not all types are supported the width of the types must match.- Parameters:
type
- the type you want to go to.- Returns:
- a ColumnView that cannot outlive the Column that owns the actual data it points to.
-
bitCastTo
Zero-copy cast between types with the same underlying length. Similar to bit_cast in C++. This will take the underlying data and create new metadata so it is interpreted as a new type. Not all types are supported the width of the types must match.- Parameters:
type
- the type you want to go to.- Returns:
- a ColumnView that cannot outlive the Column that owns the actual data it points to.
-
asBytes
Cast to Byte - ColumnVector This method takes the value provided by the ColumnVector and casts to byte When casting from a Date, Timestamp, or Boolean to a byte type the underlying numerical representation of the data will be used for the cast.- Returns:
- A new vector allocated on the GPU
-
asByteList
Cast to list of bytes This method converts the rows provided by the ColumnVector and casts each row to a list of bytes with endinanness reversed. Numeric and string types supported, but not timestamps.- Returns:
- A new vector allocated on the GPU
-
asByteList
Cast to list of bytes This method converts the rows provided by the ColumnVector and casts each row to a list of bytes. Numeric and string types supported, but not timestamps.- Parameters:
config
- Flips the byte order (endianness) if true, retains byte order otherwise- Returns:
- A new vector allocated on the GPU
-
asUnsignedBytes
Cast to unsigned Byte - ColumnVector This method takes the value provided by the ColumnVector and casts to byte When casting from a Date, Timestamp, or Boolean to a byte type the underlying numerical representation of the data will be used for the cast.Java does not have an unsigned byte type, so properly decoding these values will require extra steps on the part of the application. See
Byte.toUnsignedInt(byte)
.- Returns:
- A new vector allocated on the GPU
-
asShorts
Cast to Short - ColumnVector This method takes the value provided by the ColumnVector and casts to short When casting from a Date, Timestamp, or Boolean to a short type the underlying numerical representation of the data will be used for the cast.- Returns:
- A new vector allocated on the GPU
-
asUnsignedShorts
Cast to unsigned Short - ColumnVector This method takes the value provided by the ColumnVector and casts to short When casting from a Date, Timestamp, or Boolean to a short type the underlying numerical representation of the data will be used for the cast.Java does not have an unsigned short type, so properly decoding these values will require extra steps on the part of the application. See
Short.toUnsignedInt(short)
.- Returns:
- A new vector allocated on the GPU
-
asInts
Cast to Int - ColumnVector This method takes the value provided by the ColumnVector and casts to int When casting from a Date, Timestamp, or Boolean to a int type the underlying numerical representation of the data will be used for the cast.- Returns:
- A new vector allocated on the GPU
-
asUnsignedInts
Cast to unsigned Int - ColumnVector This method takes the value provided by the ColumnVector and casts to int When casting from a Date, Timestamp, or Boolean to a int type the underlying numerical representation of the data will be used for the cast.Java does not have an unsigned int type, so properly decoding these values will require extra steps on the part of the application. See
Integer.toUnsignedLong(int)
.- Returns:
- A new vector allocated on the GPU
-
asLongs
Cast to Long - ColumnVector This method takes the value provided by the ColumnVector and casts to long When casting from a Date, Timestamp, or Boolean to a long type the underlying numerical representation of the data will be used for the cast.- Returns:
- A new vector allocated on the GPU
-
asUnsignedLongs
Cast to unsigned Long - ColumnVector This method takes the value provided by the ColumnVector and casts to long When casting from a Date, Timestamp, or Boolean to a long type the underlying numerical representation of the data will be used for the cast.Java does not have an unsigned long type, so properly decoding these values will require extra steps on the part of the application. See
Long.toUnsignedString(long)
.- Returns:
- A new vector allocated on the GPU
-
asFloats
Cast to Float - ColumnVector This method takes the value provided by the ColumnVector and casts to float When casting from a Date, Timestamp, or Boolean to a float type the underlying numerical representatio of the data will be used for the cast.- Returns:
- A new vector allocated on the GPU
-
asDoubles
Cast to Double - ColumnVector This method takes the value provided by the ColumnVector and casts to double When casting from a Date, Timestamp, or Boolean to a double type the underlying numerical representation of the data will be used for the cast.- Returns:
- A new vector allocated on the GPU
-
asTimestampDays
Cast to TIMESTAMP_DAYS - ColumnVector This method takes the value provided by the ColumnVector and casts to TIMESTAMP_DAYS- Returns:
- A new vector allocated on the GPU
-
asTimestampDays
Cast to TIMESTAMP_DAYS - ColumnVector This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_DAYS- Parameters:
format
- timestamp string format specifier, ignored if the column type is not string- Returns:
- A new vector allocated on the GPU
-
asTimestampSeconds
Cast to TIMESTAMP_SECONDS - ColumnVector This method takes the value provided by the ColumnVector and casts to TIMESTAMP_SECONDS- Returns:
- A new vector allocated on the GPU
-
asTimestampSeconds
Cast to TIMESTAMP_SECONDS - ColumnVector This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_SECONDS- Parameters:
format
- timestamp string format specifier, ignored if the column type is not string- Returns:
- A new vector allocated on the GPU
-
asTimestampMicroseconds
Cast to TIMESTAMP_MICROSECONDS - ColumnVector This method takes the value provided by the ColumnVector and casts to TIMESTAMP_MICROSECONDS- Returns:
- A new vector allocated on the GPU
-
asTimestampMicroseconds
Cast to TIMESTAMP_MICROSECONDS - ColumnVector This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_MICROSECONDS- Parameters:
format
- timestamp string format specifier, ignored if the column type is not string- Returns:
- A new vector allocated on the GPU
-
asTimestampMilliseconds
Cast to TIMESTAMP_MILLISECONDS - ColumnVector This method takes the value provided by the ColumnVector and casts to TIMESTAMP_MILLISECONDS.- Returns:
- A new vector allocated on the GPU
-
asTimestampMilliseconds
Cast to TIMESTAMP_MILLISECONDS - ColumnVector This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_MILLISECONDS.- Parameters:
format
- timestamp string format specifier, ignored if the column type is not string- Returns:
- A new vector allocated on the GPU
-
asTimestampNanoseconds
Cast to TIMESTAMP_NANOSECONDS - ColumnVector This method takes the value provided by the ColumnVector and casts to TIMESTAMP_NANOSECONDS.- Returns:
- A new vector allocated on the GPU
-
asTimestampNanoseconds
Cast to TIMESTAMP_NANOSECONDS - ColumnVector This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_NANOSECONDS.- Parameters:
format
- timestamp string format specifier, ignored if the column type is not string- Returns:
- A new vector allocated on the GPU
-
asTimestamp
Parse a string to a timestamp. Strings that fail to parse will default to 0, corresponding to 1970-01-01 00:00:00.000.- Parameters:
timestampType
- timestamp DType that includes the time unit to parse the timestamp into.format
- strptime format specifier string of the timestamp. Used to parse and convert the timestamp with. Supports %Y,%y,%m,%d,%H,%I,%p,%M,%S,%f,%z format specifiers. See https://github.com/rapidsai/custrings/blob/branch-0.10/docs/source/datetime.md for full parsing format specification and documentation.- Returns:
- A new ColumnVector containing the long representations of the timestamps in the original column vector.
-
asStrings
Cast to Strings. Negative timestamp values are not currently supported and will yield undesired results. See github issue https://github.com/rapidsai/cudf/issues/3116 for details In case of timestamps it follows the following formatsDType.TIMESTAMP_DAYS
- "%Y-%m-%d"DType.TIMESTAMP_SECONDS
- "%Y-%m-%d %H:%M:%S"DType.TIMESTAMP_MICROSECONDS
- "%Y-%m-%d %H:%M:%S.%f"DType.TIMESTAMP_MILLISECONDS
- "%Y-%m-%d %H:%M:%S.%f"DType.TIMESTAMP_NANOSECONDS
- "%Y-%m-%d %H:%M:%S.%f"- Returns:
- A new vector allocated on the GPU.
-
asStrings
Method to parse and convert a timestamp column vector to string column vector. A unix timestamp is a long value representing how many units since 1970-01-01 00:00:00:000 in either positive or negative direction. No checking is done for invalid formats or invalid timestamp units. Negative timestamp values are not currently supported and will yield undesired results. See github issue https://github.com/rapidsai/cudf/issues/3116 for details- Parameters:
format
- - strftime format specifier string of the timestamp. Its used to parse and convert the timestamp with. Supports %m,%j,%d,%H,%M,%S,%y,%Y,%f format specifiers. %d Day of the month: 01-31 %m Month of the year: 01-12 %y Year without century: 00-99c %Y Year with century: 0001-9999 %H 24-hour of the day: 00-23 %M Minute of the hour: 00-59 %S Second of the minute: 00-59 %f 6-digit microsecond: 000000-999999 See https://github.com/rapidsai/custrings/blob/branch-0.10/docs/source/datetime.md Reported bugs https://github.com/rapidsai/cudf/issues/4160 after the bug is fixed this method should also support %I 12-hour of the day: 01-12 %p Only 'AM', 'PM' %j day of the year- Returns:
- A new vector allocated on the GPU
-
isTimestamp
Verifies that a string column can be parsed to timestamps using the provided format pattern. The format pattern can include the following specifiers: "%Y,%y,%m,%d,%H,%I,%p,%M,%S,%f,%z" | Specifier | Description | | :-------: | ----------- | | \%d | Day of the month: 01-31 | | \%m | Month of the year: 01-12 | | \%y | Year without century: 00-99 | | \%Y | Year with century: 0001-9999 | | \%H | 24-hour of the day: 00-23 | | \%I | 12-hour of the day: 01-12 | | \%M | Minute of the hour: 00-59| | \%S | Second of the minute: 00-59 | | \%f | 6-digit microsecond: 000000-999999 | | \%z | UTC offset with format ±HHMM Example +0500 | | \%j | Day of the year: 001-366 | | \%p | Only 'AM', 'PM' or 'am', 'pm' are recognized | Other specifiers are not currently supported. The "%f" supports a precision value to read the numeric digits. Specify the precision with a single integer value (1-9) as follows: use "%3f" for milliseconds, "%6f" for microseconds and "%9f" for nanoseconds. Any null string entry will result in a corresponding null row in the output column. This will return a column of type boolean where a `true` row indicates the corresponding input string can be parsed correctly with the given format.- Parameters:
format
- String specifying the timestamp format in strings.- Returns:
- New boolean ColumnVector.
-
extractListElement
For each list in this column pull out the entry at the given index. If the entry would go off the end of the list a NULL is returned instead.- Parameters:
index
- 0 based offset into the list. Negative values go backwards from the end of the list.- Returns:
- a new column of the values at those indexes.
-
extractListElement
For each list in this column pull out the entry at the corresponding index specified in the index column. If the entry goes off the end of the list a NULL is returned instead. The index column should have the same row count with the list column.- Parameters:
indices
- a column of 0 based offsets into the list. Negative values go backwards from the end of the list.- Returns:
- a new column of the values at those indexes.
-
dropListDuplicates
Create a new LIST column by copying elements from the current LIST column ignoring duplicate, producing a LIST column in which each list contain only unique elements. Relative ordering elements will be kept the same, by default can keep any of the duplicates Example: [0,3,4,0] may produce either [0,3,4] or [3,4,0], both of which are valid here- Returns:
- A new LIST column having unique list elements.
-
dropListDuplicates
Create a new LIST column by copying elements from the current LIST column ignoring duplicate, producing a LIST column in which each list contain only unique elements. Order of the output elements within each list will be preserved as in the input- Parameters:
keep_option
- Flag to specify which element to keep (first, last, any)- Returns:
- A new LIST column having unique list elements.
-
dropListDuplicatesWithKeysValues
Given a LIST column in which each element is a struct containing a <key, value> pair. An output LIST column is generated by copying elements of the current column in a way such that if a list contains multiple elements having the same key then only the last element will be copied.- Returns:
- A new LIST column having list elements with unique keys.
-
flattenLists
Flatten each list of lists into a single list. The column must have rows that are lists of lists. Any row containing null list elements will result in a null output row.- Returns:
- A new column vector containing the flattened result
-
flattenLists
Flatten each list of lists into a single list. The column must have rows that are lists of lists.- Parameters:
ignoreNull
- Whether to ignore null list elements in the input column from the operation, or any row containing null list elements will result in a null output row- Returns:
- A new column vector containing the flattened result
-
reverseStringsOrLists
Copy the current column to a new column, each string or list of the output column will have reverse order of characters or elements.- Returns:
- A new column with lists or strings having reverse order.
-
upper
Convert a string to upper case. -
lower
Convert a string to lower case. -
stringLocate
Locates the starting index of the first instance of the given string in each row of a column. 0 indexing, returns -1 if the substring is not found. Overloading stringLocate to support default values for start (0) and end index.- Parameters:
substring
- scalar containing the string to locate within each row.
-
stringLocate
Locates the starting index of the first instance of the given string in each row of a column. 0 indexing, returns -1 if the substring is not found. Overloading stringLocate to support default value for end index (-1, the end of each string).- Parameters:
substring
- scalar containing the string to locate within each row.start
- character index to start the search from (inclusive).
-
stringLocate
Locates the starting index of the first instance of the given string in each row of a column. 0 indexing, returns -1 if the substring is not found. Can be be configured to start or end the search mid string.- Parameters:
substring
- scalar containing the string scalar to locate within each row.start
- character index to start the search from (inclusive).end
- character index to end the search on (exclusive).
-
stringSplit
Deprecated.Returns a list of columns by splitting each string using the specified pattern. The number of rows in the output columns will be the same as the input column. Null entries are added for a row where split results have been exhausted. Null input entries result in all nulls in the corresponding rows of the output columns.- Parameters:
pattern
- UTF-8 encoded string identifying the split pattern for each input string.limit
- the maximum size of the list resulting from splitting each input string, or -1 for all possible splits. Note that limit = 0 (all possible splits without trailing empty strings) and limit = 1 (no split at all) are not supported.splitByRegex
- a boolean flag indicating whether the input strings will be split by a regular expression pattern or just by a string literal delimiter.- Returns:
- list of strings columns as a table.
-
stringSplit
Returns a list of columns by splitting each string using the specified regex program pattern. The number of rows in the output columns will be the same as the input column. Null entries are added for the rows where split results have been exhausted. Null input entries result in all nulls in the corresponding rows of the output columns.- Parameters:
regexProg
- the regex program with UTF-8 encoded string identifying the split pattern for each input string.limit
- the maximum size of the list resulting from splitting each input string, or -1 for all possible splits. Note that limit = 0 (all possible splits without trailing empty strings) and limit = 1 (no split at all) are not supported.- Returns:
- list of strings columns as a table.
-
stringSplit
Deprecated.Returns a list of columns by splitting each string using the specified pattern. The number of rows in the output columns will be the same as the input column. Null entries are added for a row where split results have been exhausted. Null input entries result in all nulls in the corresponding rows of the output columns.- Parameters:
pattern
- UTF-8 encoded string identifying the split pattern for each input string.splitByRegex
- a boolean flag indicating whether the input strings will be split by a regular expression pattern or just by a string literal delimiter.- Returns:
- list of strings columns as a table.
-
stringSplit
Returns a list of columns by splitting each string using the specified string literal delimiter. The number of rows in the output columns will be the same as the input column. Null entries are added for a row where split results have been exhausted. Null input entries result in all nulls in the corresponding rows of the output columns.- Parameters:
delimiter
- UTF-8 encoded string identifying the split delimiter for each input string.limit
- the maximum size of the list resulting from splitting each input string, or -1 for all possible splits. Note that limit = 0 (all possible splits without trailing empty strings) and limit = 1 (no split at all) are not supported.- Returns:
- list of strings columns as a table.
-
stringSplit
Returns a list of columns by splitting each string using the specified string literal delimiter. The number of rows in the output columns will be the same as the input column. Null entries are added for a row where split results have been exhausted. Null input entries result in all nulls in the corresponding rows of the output columns.- Parameters:
delimiter
- UTF-8 encoded string identifying the split delimiter for each input string.- Returns:
- list of strings columns as a table.
-
stringSplit
Returns a list of columns by splitting each string using the specified regex program pattern. The number of rows in the output columns will be the same as the input column. Null entries are added for the rows where split results have been exhausted. Null input entries result in all nulls in the corresponding rows of the output columns.- Parameters:
regexProg
- the regex program with UTF-8 encoded string identifying the split pattern for each input string.- Returns:
- list of strings columns as a table.
-
stringSplitRecord
@Deprecated public final ColumnVector stringSplitRecord(String pattern, int limit, boolean splitByRegex) Deprecated.Returns a column that are lists of strings in which each list is made by splitting the corresponding input string using the specified pattern.- Parameters:
pattern
- UTF-8 encoded string identifying the split pattern for each input string.limit
- the maximum size of the list resulting from splitting each input string, or -1 for all possible splits. Note that limit = 0 (all possible splits without trailing empty strings) and limit = 1 (no split at all) are not supported.splitByRegex
- a boolean flag indicating whether the input strings will be split by a regular expression pattern or just by a string literal delimiter.- Returns:
- a LIST column of string elements.
-
stringSplitRecord
Returns a column that are lists of strings in which each list is made by splitting the corresponding input string using the specified regex program pattern.- Parameters:
regexProg
- the regex program with UTF-8 encoded string identifying the split pattern for each input string.limit
- the maximum size of the list resulting from splitting each input string, or -1 for all possible splits. Note that limit = 0 (all possible splits without trailing empty strings) and limit = 1 (no split at all) are not supported.- Returns:
- a LIST column of string elements.
-
stringSplitRecord
Deprecated.Returns a column that are lists of strings in which each list is made by splitting the corresponding input string using the specified pattern.- Parameters:
pattern
- UTF-8 encoded string identifying the split pattern for each input string.splitByRegex
- a boolean flag indicating whether the input strings will be split by a regular expression pattern or just by a string literal delimiter.- Returns:
- a LIST column of string elements.
-
stringSplitRecord
Returns a column that are lists of strings in which each list is made by splitting the corresponding input string using the specified string literal delimiter.- Parameters:
delimiter
- UTF-8 encoded string identifying the split delimiter for each input string.limit
- the maximum size of the list resulting from splitting each input string, or -1 for all possible splits. Note that limit = 0 (all possible splits without trailing empty strings) and limit = 1 (no split at all) are not supported.- Returns:
- a LIST column of string elements.
-
stringSplitRecord
Returns a column that are lists of strings in which each list is made by splitting the corresponding input string using the specified string literal delimiter.- Parameters:
delimiter
- UTF-8 encoded string identifying the split delimiter for each input string.- Returns:
- a LIST column of string elements.
-
stringSplitRecord
Returns a column that are lists of strings in which each list is made by splitting the corresponding input string using the specified regex program pattern.- Parameters:
regexProg
- the regex program with UTF-8 encoded string identifying the split pattern for each input string.- Returns:
- a LIST column of string elements.
-
substring
Returns a new strings column that contains substrings of the strings in the provided column. The character positions to retrieve in each string are `[start,)`.. - Parameters:
start
- first character index to begin the substring(inclusive).
-
substring
Returns a new strings column that contains substrings of the strings in the provided column. 0-based indexing, If the stop position is past end of a string's length, then end of string is used as stop position for that string.- Parameters:
start
- first character index to begin the substring(inclusive).end
- last character index to stop the substring(exclusive)- Returns:
- A new java column vector containing the substrings.
-
substring
Returns a new strings column that contains substrings of the strings in the provided column which uses unique ranges for each string- Parameters:
start
- Vector containing start indices of each stringend
- Vector containing end indices of each string. -1 indicated to read until end of string.- Returns:
- A new java column vector containing the substrings/
-
stringConcatenateListElements
Given a lists column of strings (each row is a list of strings), concatenates the strings within each row and returns a single strings column result. Each new string is created by concatenating the strings from the same row (same list element) delimited by the separator provided. This version of the function relaces nulls with empty string and returns null for empty list.- Parameters:
sepCol
- strings column that provides separators for concatenation.- Returns:
- A new java column vector containing the concatenated strings with separator between.
-
stringConcatenateListElements
public final ColumnVector stringConcatenateListElements(ColumnView sepCol, Scalar separatorNarep, Scalar stringNarep, boolean separateNulls, boolean emptyStringOutputIfEmptyList) Given a lists column of strings (each row is a list of strings), concatenates the strings within each row and returns a single strings column result. Each new string is created by concatenating the strings from the same row (same list element) delimited by the row separator provided in the sepCol strings column.- Parameters:
sepCol
- strings column that provides separators for concatenation.separatorNarep
- string scalar indicating null behavior when a separator is null. If set to null and the separator is null the resulting string will be null. If not null, this string will be used in place of a null separator.stringNarep
- string that should be used to replace null strings in any non-null list row. If set to null and the string is null the resulting string will be null. If not null, this string will be used in place of a null value.separateNulls
- if true, then the separator is included for null rows if `stringNarep` is valid.emptyStringOutputIfEmptyList
- if set to true, any input row that is an empty list will result in an empty string. Otherwise, it will result in a null.- Returns:
- A new java column vector containing the concatenated strings with separator between.
-
stringConcatenateListElements
public final ColumnVector stringConcatenateListElements(Scalar separator, Scalar narep, boolean separateNulls, boolean emptyStringOutputIfEmptyList) Given a lists column of strings (each row is a list of strings), concatenates the strings within each row and returns a single strings column result. Each new string is created by concatenating the strings from the same row (same list element) delimited by the separator provided.- Parameters:
separator
- string scalar inserted between each string being merged.narep
- string scalar indicating null behavior. If set to null and any string in the row is null the resulting string will be null. If not null, null values in any column will be replaced by the specified string. The underlying value in the string scalar may be null, but the object passed in may not.separateNulls
- if true, then the separator is included for null rows if `narep` is valid.emptyStringOutputIfEmptyList
- if set to true, any input row that is an empty list will result in an empty string. Otherwise, it will result in a null.- Returns:
- A new java column vector containing the concatenated strings with separator between.
-
repeatStrings
Given a strings column, each string in it is repeated a number of times specified by therepeatTimes
parameter. In special cases: - IfrepeatTimes
is not a positive number, a non-null input string will always result in an empty output string. - A null input string will always result in a null output string regardless of the value of therepeatTimes
parameter.- Parameters:
repeatTimes
- The number of times each input string is repeated.- Returns:
- A new java column vector containing repeated strings.
-
repeatStrings
Given a strings column, an output strings column is generated by repeating each of the input string by a number of times given by the corresponding row in arepeatTimes
numeric column. In special cases: - Any null row (from either the input strings column or therepeatTimes
column) will always result in a null output string. - If any value in therepeatTimes
column is not a positive number and its corresponding input string is not null, the output string will be an empty string.- Parameters:
repeatTimes
- The column containing numbers of times each input string is repeated.- Returns:
- A new java column vector containing repeated strings.
-
getJSONObject
Apply a JSONPath string to all rows in an input strings column. Applies a JSONPath string to an incoming strings column where each row in the column is a valid json string. The output is returned by row as a strings column. For reference, https://tools.ietf.org/id/draft-goessner-dispatch-jsonpath-00.html Note: Only implements the operators: $ . [] *- Parameters:
path
- The JSONPath string to be applied to each rowpath
- The GetJsonObjectOptions to control get_json_object behaviour- Returns:
- new strings ColumnVector containing the retrieved json object strings
-
getJSONObject
Apply a JSONPath string to all rows in an input strings column. Applies a JSONPath string to an incoming strings column where each row in the column is a valid json string. The output is returned by row as a strings column. For reference, https://tools.ietf.org/id/draft-goessner-dispatch-jsonpath-00.html Note: Only implements the operators: $ . [] *- Parameters:
path
- The JSONPath string to be applied to each row- Returns:
- new strings ColumnVector containing the retrieved json object strings
-
stringReplace
Returns a new strings column where target string within each string is replaced with the specified replacement string. The replacement proceeds from the beginning of the string to the end, for example, replacing "aa" with "b" in the string "aaa" will result in "ba" rather than "ab". Specifying an empty string for replace will essentially remove the target string if found in each string. Null string entries will return null output string entries. target Scalar should be string and should not be empty or null.- Parameters:
target
- String to search for within each string.replace
- Replacement string if target is found.- Returns:
- A new java column vector containing replaced strings
-
stringReplace
Returns a new strings column where target strings with each string are replaced with corresponding replacement strings. For each string in the column, the list of targets is searched within that string. If a target string is found, it is replaced by the corresponding entry in the repls column. All occurrences found in each string are replaced. The repls argument can optionally contain a single string. In this case, all matching target substrings will be replaced by that single string. Example: cv = ["hello", "goodbye"] targets = ["e","o"] repls = ["EE","OO"] r1 = cv.stringReplace(targets, repls) r1 is now ["hEEllO", "gOOOOdbyEE"] targets = ["e", "o"] repls = ["_"] r2 = cv.stringReplace(targets, repls) r2 is now ["h_ll_", "g__dby_"]- Parameters:
targets
- Strings to search for in each string.repls
- Corresponding replacement strings for target strings.- Returns:
- A new java column vector containing the replaced strings.
-
replaceRegex
Deprecated.For each string, replaces any character sequence matching the given pattern using the replacement string scalar.- Parameters:
pattern
- The regular expression pattern to search within each string.repl
- The string scalar to replace for each pattern match.- Returns:
- A new column vector containing the string results.
-
replaceRegex
For each string, replaces any character sequence matching the given regex program pattern using the replacement string scalar.- Parameters:
regexProg
- The regex program with pattern to search within each string.repl
- The string scalar to replace for each pattern match.- Returns:
- A new column vector containing the string results.
-
replaceRegex
Deprecated.For each string, replaces any character sequence matching the given pattern using the replacement string scalar.- Parameters:
pattern
- The regular expression pattern to search within each string.repl
- The string scalar to replace for each pattern match.maxRepl
- The maximum number of times a replacement should occur within each string.- Returns:
- A new column vector containing the string results.
-
replaceRegex
For each string, replaces any character sequence matching the given regex program pattern using the replacement string scalar.- Parameters:
regexProg
- The regex program with pattern to search within each string.repl
- The string scalar to replace for each pattern match.maxRepl
- The maximum number of times a replacement should occur within each string.- Returns:
- A new column vector containing the string results.
-
replaceMultiRegex
For each string, replaces any character sequence matching any of the regular expression patterns with the corresponding replacement strings.- Parameters:
patterns
- The regular expression patterns to search within each string.repls
- The string scalars to replace for each corresponding pattern match.- Returns:
- A new column vector containing the string results.
-
stringReplaceWithBackrefs
Deprecated.For each string, replaces any character sequence matching the given pattern using the replace template for back-references. Any null string entries return corresponding null output column entries.- Parameters:
pattern
- The regular expression patterns to search within each string.replace
- The replacement template for creating the output string.- Returns:
- A new java column vector containing the string results.
-
stringReplaceWithBackrefs
For each string, replaces any character sequence matching the given regex program pattern using the replace template for back-references. Any null string entries return corresponding null output column entries.- Parameters:
regexProg
- The regex program with pattern to search within each string.replace
- The replacement template for creating the output string.- Returns:
- A new java column vector containing the string results.
-
zfill
Add '0' as padding to the left of each string. If the string is already width or more characters, no padding is performed. No strings are truncated. Null string entries result in null entries in the output column.- Parameters:
width
- The minimum number of characters for each string.- Returns:
- New column of strings.
-
pad
Pad the Strings column until it reaches the desired length with spaces " " on the right. If the string is already width or more characters, no padding is performed. No strings are truncated. Null string entries result in null entries in the output column.- Parameters:
width
- the minimum number of characters for each string.- Returns:
- the new strings column.
-
pad
Pad the Strings column until it reaches the desired length with spaces " ". If the string is already width or more characters, no padding is performed. No strings are truncated. Null string entries result in null entries in the output column.- Parameters:
width
- the minimum number of characters for each string.side
- where to add new characters.- Returns:
- the new strings column.
-
pad
Pad the Strings column until it reaches the desired length. If the string is already width or more characters, no padding is performed. No strings are truncated. Null string entries result in null entries in the output column.- Parameters:
width
- the minimum number of characters for each string.side
- where to add new characters.fillChar
- a single character string that holds what should be added.- Returns:
- the new strings column.
-
startsWith
Checks if each string in a column starts with a specified comparison string, resulting in a parallel column of the boolean results.- Parameters:
pattern
- scalar containing the string being searched for at the beginning of the column's strings.- Returns:
- A new java column vector containing the boolean results.
-
endsWith
Checks if each string in a column ends with a specified comparison string, resulting in a parallel column of the boolean results.- Parameters:
pattern
- scalar containing the string being searched for at the end of the column's strings.- Returns:
- A new java column vector containing the boolean results.
-
strip
Removes whitespace from the beginning and end of a string.- Returns:
- A new java column vector containing the stripped strings.
-
strip
Removes the specified characters from the beginning and end of each string.- Parameters:
toStrip
- UTF-8 encoded characters to strip from each string.- Returns:
- A new java column vector containing the stripped strings.
-
lstrip
Removes whitespace from the beginning of a string.- Returns:
- A new java column vector containing the stripped strings.
-
lstrip
Removes the specified characters from the beginning of each string.- Parameters:
toStrip
- UTF-8 encoded characters to strip from each string.- Returns:
- A new java column vector containing the stripped strings.
-
rstrip
Removes whitespace from the end of a string.- Returns:
- A new java column vector containing the stripped strings.
-
rstrip
Removes the specified characters from the end of each string.- Parameters:
toStrip
- UTF-8 encoded characters to strip from each string.- Returns:
- A new java column vector containing the stripped strings.
-
stringContains
Checks if each string in a column contains a specified comparison string, resulting in a parallel column of the boolean results.- Parameters:
compString
- scalar containing the string being searched for.- Returns:
- A new java column vector containing the boolean results.
-
stringContains
- Parameters:
targets
- UTF-8 encoded strings to search for in each string in `input`- Returns:
- BOOL8 columns
-
clamp
Replaces values less than `lo` in `input` with `lo`, and values greater than `hi` with `hi`. if `lo` is invalid, then lo will not be considered while evaluating the input (Essentially considered minimum value of that type). if `hi` is invalid, then hi will not be considered while evaluating the input (Essentially considered maximum value of that type). ``` Example: input: {1, 2, 3, NULL, 5, 6, 7} valid lo and hi lo: 3, hi: 5, lo_replace : 0, hi_replace : 16 output:{0, 0, 3, NULL, 5, 16, 16} invalid lo lo: NULL, hi: 5, lo_replace : 0, hi_replace : 16 output:{1, 2, 3, NULL, 5, 16, 16} invalid hi lo: 3, hi: NULL, lo_replace : 0, hi_replace : 16 output:{0, 0, 3, NULL, 5, 6, 7} ```- Parameters:
lo
- - Minimum clamp value. All elements less than `lo` will be replaced by `lo`. Ignored if null.hi
- - Maximum clamp value. All elements greater than `hi` will be replaced by `hi`. Ignored if null.- Returns:
- Returns a new clamped column as per `lo` and `hi` boundaries
-
clamp
Replaces values less than `lo` in `input` with `lo_replace`, and values greater than `hi` with `hi_replace`. if `lo` is invalid, then lo will not be considered while evaluating the input (Essentially considered minimum value of that type). if `hi` is invalid, then hi will not be considered while evaluating the input (Essentially considered maximum value of that type).- Parameters:
lo
- - Minimum clamp value. All elements less than `lo` will be replaced by `loReplace`. Ignored if null.loReplace
- - All elements less than `lo` will be replaced by `loReplace`.hi
- - Maximum clamp value. All elements greater than `hi` will be replaced by `hiReplace`. Ignored if null.hiReplace
- - All elements greater than `hi` will be replaced by `hiReplace`.- Returns:
- - a new clamped column as per `lo` and `hi` boundaries
-
matchesRe
Deprecated.Returns a boolean ColumnVector identifying rows which match the given regex pattern but only at the beginning of the string. ``` cv = ["abc", "123", "def456"] result = cv.matchesRe("\\d+") r is now [false, true, false] ``` Any null string entries return corresponding null output column entries. For supported regex patterns refer to: -
matchesRe
Returns a boolean ColumnVector identifying rows which match the given regex program pattern but only at the beginning of the string. ``` cv = ["abc", "123", "def456"] p = new RegexProgram("\\d+", CaptureGroups.NON_CAPTURE) r = cv.matchesRe(p) r is now [false, true, false] ``` Any null string entries return corresponding null output column entries. For supported regex patterns refer to: -
containsRe
Deprecated.Returns a boolean ColumnVector identifying rows which match the given regex pattern starting at any location. ``` cv = ["abc", "123", "def456"] r = cv.containsRe("\\d+") r is now [false, true, true] ``` Any null string entries return corresponding null output column entries. For supported regex patterns refer to: -
containsRe
Returns a boolean ColumnVector identifying rows which match the given RegexProgram pattern starting at any location. ``` cv = ["abc", "123", "def456"] p = new RegexProgram("\\d+", CaptureGroups.NON_CAPTURE) r = cv.containsRe(p) r is now [false, true, true] ``` Any null string entries return corresponding null output column entries. For supported regex patterns refer to: -
extractRe
Deprecated.For each captured group specified in the given regular expression return a column in the table. Null entries are added if the string does not match. Any null inputs also result in null output entries. For supported regex patterns refer to:- Throws:
CudfException
-
extractRe
For each captured group specified in the given regex program return a column in the table. Null entries are added if the string does not match. Any null inputs also result in null output entries. For supported regex patterns refer to:- Throws:
CudfException
-
extractAllRecord
Deprecated.Extracts all strings that match the given regular expression and corresponds to the regular expression group index. Any null inputs also result in null output entries. For supported regex patterns refer to: -
extractAllRecord
Extracts all strings that match the given regex program pattern and corresponds to the regular expression group index. Any null inputs also result in null output entries. For supported regex patterns refer to: -
like
Returns a boolean ColumnVector identifying rows which match the given like pattern. The like pattern expects only 2 wildcard special characters - `%` any number of any character (including no characters) - `_` any single character ``` cv = ["azaa", "ababaabba", "aaxa"] r = cv.like("%a_aa%", "\\") r is now [true, true, false] r = cv.like("a__a", "\\") r is now [true, false, true] ``` The escape character is specified to include either `%` or `_` in the search, which is expected to be either 0 or 1 character. If more than one character is specified, only the first character is used. ``` cv = ["abc_def", "abc1def", "abc_"] r = cv.like("abc/_d%", "/") r is now [true, false, false] ``` Any null string entries return corresponding null output column entries.- Parameters:
pattern
- Like pattern to match to each string.escapeChar
- Character specifies the escape prefix; default is "\\".- Returns:
- New ColumnVector of boolean results for each string.
-
urlDecode
Converts all character sequences starting with '%' into character code-points interpreting the 2 following characters as hex values to create the code-point. For example, the sequence '%20' is converted into byte (0x20) which is a single space character. Another example converts '%C3%A9' into 2 sequential bytes (0xc3 and 0xa9 respectively) which is the é character. Overall, 3 characters are converted into one char byte whenever a '%%' (single percent) character is encountered in the string.Any null entries will result in corresponding null entries in the output column.
- Returns:
- a new column instance containing the decoded strings
- Throws:
CudfException
-
urlEncode
Converts mostly non-ascii characters and control characters into UTF-8 hex code-points prefixed with '%'. For example, the space character must be converted to characters '%20' where the '20' indicates the hex value for space in UTF-8. Likewise, multi-byte characters are converted to multiple hex characters. For example, the é character is converted to characters '%C3%A9' where 'C3A9' is the UTF-8 bytes 0xC3A9 for this character.Any null entries will result in corresponding null entries in the output column.
- Returns:
- a new column instance containing the encoded strings
- Throws:
CudfException
-
getMapValue
Given a column of type List<Struct<X, Y>> and a key column of type X, return a column of type Y, where each row in the output column is the Y value corresponding to the X key. If the key is not found, the corresponding output value is null.- Parameters:
keys
- the column view with keys to lookup in the column- Returns:
- a column of values or nulls based on the lookup result
-
getMapValue
Given a column of type List<Struct<X, Y>> and a key of type X, return a column of type Y, where each row in the output column is the Y value corresponding to the X key. If the key is not found, the corresponding output value is null.- Parameters:
key
- the scalar key to lookup in the column- Returns:
- a column of values or nulls based on the lookup result
-
getMapKeyExistence
For a column of type List<Struct<String, String>> and a passed in String key, return a boolean column for all keys in the structs, It is true if the key exists in the corresponding map for that row, false otherwise. It will never return null for a row.- Parameters:
key
- the String scalar to lookup in the column- Returns:
- a boolean column based on the lookup result
-
getMapKeyExistence
For a column of type List<Struct<_, _>> and a passed in key column, return a boolean column for all keys in the map. Each output row is true if the key exists in the corresponding map for that row, false otherwise. It will never return null for a row.- Parameters:
keys
- the keys to lookup in the column- Returns:
- a boolean column based on the lookup result
-
makeStructView
Create a new struct column view of existing column views. Note that this will NOT copy the contents of the input columns to make a new vector, but makes a view that must not outlive the child views that it references. The resulting column cannot be null.- Parameters:
rows
- the number of rows in the struct column. This is needed if no columns are provided.columns
- the columns to add to the struct in the order they should be added- Returns:
- the new column view. It is the responsibility of the caller to close this.
-
makeStructView
Create a new struct column view of existing column views. Note that this will NOT copy the contents of the input columns to make a new vector, but makes a view that must not outlive the child views that it references. The resulting column cannot be null.- Parameters:
columns
- the columns to add to the struct in the order they should be added- Returns:
- the new column view. It is the responsibility of the caller to close this.
-
fromDeviceBuffer
public static ColumnView fromDeviceBuffer(BaseDeviceMemoryBuffer buffer, long startOffset, DType type, int rows) Create a new column view from a raw device buffer. Note that this will NOT copy the contents of the buffer but only creates a view. The view MUST NOT outlive the underlying device buffer. The column view will be created without a validity vector, so it is not possible to create a view containing null elements. Additionally only fixed-width primitive types are supported.- Parameters:
buffer
- device memory that will back the column viewstartOffset
- byte offset into the device buffer where the column data startstype
- type of data in the column viewrows
- number of data elements in the column view- Returns:
- new column view instance that must not outlive the backing device buffer
-
listContains
Create a column of bool values indicating whether the specified scalar is an element of each row of a list column. Output `column[i]` is set to null if one or more of the following are true: 1. The key is null 2. The column vector list value is null- Parameters:
key
- the scalar to look up- Returns:
- a Boolean ColumnVector with the result of the lookup
-
listContainsColumn
Create a column of bool values indicating whether the list rows of the first column contain the corresponding values in the second column. Output `column[i]` is set to null if one or more of the following are true: 1. The key value is null 2. The column vector list value is null- Parameters:
key
- the ColumnVector with look up values- Returns:
- a Boolean ColumnVector with the result of the lookup
-
listContainsNulls
Create a column of bool values indicating whether the list rows of the specified column contain null elements. Output `column[i]` is set to null iff the input list row is null.- Returns:
- a Boolean ColumnVector with the result of the lookup
-
listIndexOf
Create a column of int32 indices, indicating the position of the scalar search key in each list row. All indices are 0-based. If a search key is not found, the index is set to -1. The index is set to null if one of the following is true: 1. The search key is null. 2. The list row is null.- Parameters:
key
- The scalar search keyfindOption
- Whether to find the first index of the key, or the last.- Returns:
- The resultant column of int32 indices
-
listIndexOf
Create a column of int32 indices, indicating the position of each row in the search key column in the corresponding row of the lists column. All indices are 0-based. If a search key is not found, the index is set to -1. The index is set to null if one of the following is true: 1. The search key row is null. 2. The list row is null.- Parameters:
keys
- ColumnView of search keys.findOption
- Whether to find the first index of the key, or the last.- Returns:
- The resultant column of int32 indices
-
listSortRows
Segmented sort of the elements within a list in each row of a list column. NOTICE: list columns with nested child are NOT supported yet.- Parameters:
isDescending
- whether sorting each row with descending order (or ascending order)isNullSmallest
- whether to regard the null value as the min value (or the max value)- Returns:
- a List ColumnVector with elements in each list sorted
-
listsHaveOverlap
For each pair of lists from the input lists columns, check if they have any common non-null elements. A null input row in any of the input columns will result in a null output row. During checking for common elements, nulls within each list are considered as different values while floating-point NaN values are considered as equal. The input lists columns must have the same size and same data type.- Parameters:
lhs
- The input lists column for one siderhs
- The input lists column for the other side- Returns:
- A column of type BOOL8 containing the check result
-
listsIntersectDistinct
Find the intersection without duplicate between lists at each row of the given lists columns. A null input row in any of the input lists columns will result in a null output row. During finding list intersection, nulls and floating-point NaN values within each list are considered as equal values. The input lists columns must have the same size and same data type.- Parameters:
lhs
- The input lists column for one siderhs
- The input lists column for the other side- Returns:
- A lists column containing the intersection result
-
listsUnionDistinct
Find the union without duplicate between lists at each row of the given lists columns. A null input row in any of the input lists columns will result in a null output row. During finding list union, nulls and floating-point NaN values within each list are considered as equal values. The input lists columns must have the same size and same data type.- Parameters:
lhs
- The input lists column for one siderhs
- The input lists column for the other side- Returns:
- A lists column containing the union result
-
listsDifferenceDistinct
Find the difference of lists of the left column against lists of the right column. Specifically, find the elements (without duplicates) from each list of the left column that do not exist in the corresponding list of the right column. A null input row in any of the input lists columns will result in a null output row. During finding, nulls and floating-point NaN values within each list are considered as equal values. The input lists columns must have the same size and same data type.- Parameters:
lhs
- The input lists column for one siderhs
- The input lists column for the other side- Returns:
- A lists column containing the difference result
-
generateListOffsets
Generate list offsets from sizes of each list. NOTICE: This API only works for INT32. Otherwise, the behavior is undefined. And no null and negative value is allowed.- Returns:
- a column of list offsets whose size is N + 1
-
getScalarElement
Get a single item from the column at the specified index as a Scalar. Be careful. This is expensive and may involve running a kernel to copy the data out.- Parameters:
index
- the index to look at- Returns:
- the value at that index as a scalar.
- Throws:
CudfException
- if the index is out of bounds.
-
applyBooleanMask
Filters elements in each row of this LIST column using `booleanMaskView` LIST of booleans as a mask.Given a list-of-bools column, the function produces a new `LIST` column of the same type as this column, where each element is copied from the row *only* if the corresponding `boolean_mask` is non-null and `true`.
E.g. column = { {0,1,2}, {3,4}, {5,6,7}, {8,9} }; boolean_mask = { {0,1,1}, {1,0}, {1,1,1}, {0,0} }; results = { {1,2}, {3}, {5,6,7}, {} };
This column and `boolean_mask` must have the same number of rows. The output column has the same number of rows as this column. An element is copied to an output row *only* if the corresponding boolean_mask element is `true`. An output row is invalid only if the row is invalid.
- Parameters:
booleanMaskView
- A nullable list of bools column used to filter elements in this column- Returns:
- List column of the same type as this column, containing filtered list rows
- Throws:
CudfException
- if `boolean_mask` is not a "lists of bools" columnCudfException
- if this column and `boolean_mask` have different number of rows
-
distinctCount
Count how many rows in the column are distinct from one another.- Parameters:
nullPolicy
- if nulls should be included or not.
-
distinctCount
public int distinctCount()Count how many rows in the column are distinct from one another. Nulls are included. -
title
protected static long title(long handle) -
copyToHost
Copy the data to the host synchronously. -
copyToHostAsync
public HostColumnVector copyToHostAsync(Cuda.Stream stream, HostMemoryAllocator hostMemoryAllocator) Copy the data to the host asynchronously. The caller MUST synchronize on the stream before examining the result. -
copyToHost
Copy the data to host memory synchronously -
copyToHostAsync
Copy the data to the host asynchronously. The caller MUST synchronize on the stream before examining the result. -
getHostBytesRequired
public long getHostBytesRequired()Calculate the total space required to copy the data to the host. This should be padded to the alignment that the CPU requires. -
hostPaddingSizeInBytes
public static long hostPaddingSizeInBytes()Get the size that the host will align memory allocations to in bytes. -
hasNonEmptyNulls
public boolean hasNonEmptyNulls()Exact check if a column or its descendants have non-empty null rows- Returns:
- Whether the column or its descendants have non-empty null rows
-
purgeNonEmptyNulls
Copies this column into output while purging any non-empty null rows in the column or its descendants. If this column is not of compound type (LIST/STRING/STRUCT/DICTIONARY), the output will be the same as input. The purge operation only applies directly to LIST and STRING columns, but it applies indirectly to STRUCT/DICTIONARY columns as well, since these columns may have child columns that are LIST or STRING. Examples: lists = data: [{{0,1}, {2,3}, {4,5}} validity: {true, false, true}] lists[1] is null, but the list's child column still stores `{2,3}`. After purging the contents of the list's null rows, the column's contents will be: lists = [data: {{0,1}, {4,5}} validity: {true, false, true}]- Returns:
- A new column with equivalent contents to `input`, but with null rows purged
-
toHex
Convert this integer column to hexadecimal column and return a new strings column Any null entries will result in corresponding null entries in the output column. The output character set is '0'-'9' and 'A'-'F'. The output string width will be a multiple of 2 depending on the size of the integer type. A single leading zero is applied to the first non-zero output byte if it is less than 0x10. Example: input = [123, -1, 0, 27, 342718233] s = input.toHex() s is [ '04D2', 'FFFFFFFF', '00', '1B', '146D7719'] The example above shows an `INT32` type column where each integer is 4 bytes. Leading zeros are suppressed unless filling out a complete byte as in `123 -> '04D2'` instead of `000004D2` or `4D2`.- Returns:
- new string ColumnVector
-