public class ColumnView extends Object implements AutoCloseable, BinaryOperable
Modifier and Type | Class and Description |
---|---|
static class |
ColumnView.FindOptions
Enum to choose behaviour of listIndexOf functions:
1.
|
Modifier and Type | Field and Description |
---|---|
protected long |
nullCount |
protected ColumnVector.OffHeapState |
offHeap |
protected long |
rows |
protected DType |
type |
static long |
UNKNOWN_NULL_COUNT |
protected long |
viewHandle |
Modifier | Constructor and Description |
---|---|
protected |
ColumnView(ColumnVector.OffHeapState state)
Intended to be called from ColumnVector when it is being constructed.
|
|
ColumnView(DType type,
long rows,
Optional<Long> nullCount,
BaseDeviceMemoryBuffer dataBuffer,
BaseDeviceMemoryBuffer validityBuffer)
Create a new column view based off of data already on the device.
|
|
ColumnView(DType type,
long rows,
Optional<Long> nullCount,
BaseDeviceMemoryBuffer dataBuffer,
BaseDeviceMemoryBuffer validityBuffer,
BaseDeviceMemoryBuffer offsetBuffer)
Create a new column view based off of data already on the device.
|
|
ColumnView(DType type,
long rows,
Optional<Long> nullCount,
BaseDeviceMemoryBuffer validityBuffer,
BaseDeviceMemoryBuffer offsetBuffer,
ColumnView[] children)
Create a new column view based off of data already on the device.
|
Modifier and Type | Method and Description |
---|---|
ColumnVector |
abs()
Calculate the abs, output is the same type as input.
|
ColumnVector |
addCalendricalMonths(ColumnView months)
Add the specified number of months to the timestamp.
|
Scalar |
all()
Returns a boolean scalar that is true if all of the elements in
the column are true or non-zero otherwise false.
|
Scalar |
all(DType outType)
Deprecated.
the only output type supported is BOOL8.
|
Scalar |
any()
Returns a boolean scalar that is true if any of the elements in
the column are true or non-zero otherwise false.
|
Scalar |
any(DType outType)
Returns a scalar is true or 1, depending on the specified type,
if any of the elements in the column are true or non-zero
otherwise false or 0.
|
ColumnVector |
applyBooleanMask(ColumnView booleanMaskView)
Filters elements in each row of this LIST column using `booleanMaskView`
LIST of booleans as a mask.
|
ColumnVector |
approxPercentile(ColumnVector percentiles)
Calculate various percentiles of this ColumnVector, which must contain centroids produced by
a t-digest aggregation.
|
ColumnVector |
approxPercentile(double[] percentiles)
Calculate various percentiles of this ColumnVector, which must contain centroids produced by
a t-digest aggregation.
|
ColumnVector |
arccos()
Calculate the arccos, output is the same type as input.
|
ColumnVector |
arccosh()
Calculate the hyperbolic arccos, output is the same type as input.
|
ColumnVector |
arcsin()
Calculate the arcsin, output is the same type as input.
|
ColumnVector |
arcsinh()
Calculate the hyperbolic arcsin, output is the same type as input.
|
ColumnVector |
arctan()
Calculate the arctan, output is the same type as input.
|
ColumnVector |
arctanh()
Calculate the hyperbolic arctan, output is the same type as input.
|
ColumnVector |
asByteList()
Cast to list of bytes
This method converts the rows provided by the ColumnVector and casts each row to a list of
bytes with endinanness reversed.
|
ColumnVector |
asByteList(boolean config)
Cast to list of bytes
This method converts the rows provided by the ColumnVector and casts each row to a list
of bytes.
|
ColumnVector |
asBytes()
Cast to Byte - ColumnVector
This method takes the value provided by the ColumnVector and casts to byte
When casting from a Date, Timestamp, or Boolean to a byte type the underlying numerical
representation of the data will be used for the cast.
|
ColumnVector |
asDoubles()
Cast to Double - ColumnVector
This method takes the value provided by the ColumnVector and casts to double
When casting from a Date, Timestamp, or Boolean to a double type the underlying numerical
representation of the data will be used for the cast.
|
ColumnVector |
asFloats()
Cast to Float - ColumnVector
This method takes the value provided by the ColumnVector and casts to float
When casting from a Date, Timestamp, or Boolean to a float type the underlying numerical
representatio of the data will be used for the cast.
|
ColumnVector |
asInts()
Cast to Int - ColumnVector
This method takes the value provided by the ColumnVector and casts to int
When casting from a Date, Timestamp, or Boolean to a int type the underlying numerical
representation of the data will be used for the cast.
|
ColumnVector |
asLongs()
Cast to Long - ColumnVector
This method takes the value provided by the ColumnVector and casts to long
When casting from a Date, Timestamp, or Boolean to a long type the underlying numerical
representation of the data will be used for the cast.
|
ColumnVector |
asShorts()
Cast to Short - ColumnVector
This method takes the value provided by the ColumnVector and casts to short
When casting from a Date, Timestamp, or Boolean to a short type the underlying numerical
representation of the data will be used for the cast.
|
ColumnVector |
asStrings()
Cast to Strings.
|
ColumnVector |
asStrings(String format)
Method to parse and convert a timestamp column vector to string column vector.
|
ColumnVector |
asTimestamp(DType timestampType,
String format)
Parse a string to a timestamp.
|
ColumnVector |
asTimestampDays()
Cast to TIMESTAMP_DAYS - ColumnVector
This method takes the value provided by the ColumnVector and casts to TIMESTAMP_DAYS
|
ColumnVector |
asTimestampDays(String format)
Cast to TIMESTAMP_DAYS - ColumnVector
This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_DAYS
|
ColumnVector |
asTimestampMicroseconds()
Cast to TIMESTAMP_MICROSECONDS - ColumnVector
This method takes the value provided by the ColumnVector and casts to TIMESTAMP_MICROSECONDS
|
ColumnVector |
asTimestampMicroseconds(String format)
Cast to TIMESTAMP_MICROSECONDS - ColumnVector
This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_MICROSECONDS
|
ColumnVector |
asTimestampMilliseconds()
Cast to TIMESTAMP_MILLISECONDS - ColumnVector
This method takes the value provided by the ColumnVector and casts to TIMESTAMP_MILLISECONDS.
|
ColumnVector |
asTimestampMilliseconds(String format)
Cast to TIMESTAMP_MILLISECONDS - ColumnVector
This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_MILLISECONDS.
|
ColumnVector |
asTimestampNanoseconds()
Cast to TIMESTAMP_NANOSECONDS - ColumnVector
This method takes the value provided by the ColumnVector and casts to TIMESTAMP_NANOSECONDS.
|
ColumnVector |
asTimestampNanoseconds(String format)
Cast to TIMESTAMP_NANOSECONDS - ColumnVector
This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_NANOSECONDS.
|
ColumnVector |
asTimestampSeconds()
Cast to TIMESTAMP_SECONDS - ColumnVector
This method takes the value provided by the ColumnVector and casts to TIMESTAMP_SECONDS
|
ColumnVector |
asTimestampSeconds(String format)
Cast to TIMESTAMP_SECONDS - ColumnVector
This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_SECONDS
|
ColumnVector |
asUnsignedBytes()
Cast to unsigned Byte - ColumnVector
This method takes the value provided by the ColumnVector and casts to byte
When casting from a Date, Timestamp, or Boolean to a byte type the underlying numerical
representation of the data will be used for the cast.
|
ColumnVector |
asUnsignedInts()
Cast to unsigned Int - ColumnVector
This method takes the value provided by the ColumnVector and casts to int
When casting from a Date, Timestamp, or Boolean to a int type the underlying numerical
representation of the data will be used for the cast.
|
ColumnVector |
asUnsignedLongs()
Cast to unsigned Long - ColumnVector
This method takes the value provided by the ColumnVector and casts to long
When casting from a Date, Timestamp, or Boolean to a long type the underlying numerical
representation of the data will be used for the cast.
|
ColumnVector |
asUnsignedShorts()
Cast to unsigned Short - ColumnVector
This method takes the value provided by the ColumnVector and casts to short
When casting from a Date, Timestamp, or Boolean to a short type the underlying numerical
representation of the data will be used for the cast.
|
ColumnVector |
binaryOp(BinaryOp op,
BinaryOperable rhs,
DType outType)
Multiple different binary operations.
|
ColumnView |
bitCastTo(DType type)
Zero-copy cast between types with the same underlying length.
|
ColumnVector |
bitInvert()
invert the bits, output is the same type as input.
|
ColumnVector |
capitalize(Scalar delimiters)
Returns a column of capitalized strings.
|
ColumnVector |
castTo(DType type)
Generic method to cast ColumnVector
When casting from a Date, Timestamp, or Boolean to a numerical type the underlying numerical
representation of the data will be used for the cast.
|
ColumnVector |
cbrt()
Calculate the cube root, output is the same type as input.
|
ColumnVector |
ceil()
Calculate the ceil, output is the same type as input.
|
ColumnVector |
clamp(Scalar lo,
Scalar hi)
Replaces values less than `lo` in `input` with `lo`,
and values greater than `hi` with `hi`.
|
ColumnVector |
clamp(Scalar lo,
Scalar loReplace,
Scalar hi,
Scalar hiReplace)
Replaces values less than `lo` in `input` with `lo_replace`,
and values greater than `hi` with `hi_replace`.
|
void |
close() |
ColumnVector |
codePoints()
Get the code point values (integers) for each character of each string.
|
ColumnVector |
contains(ColumnView searchSpace)
Returns a new column of
DType.BOOL8 elements having the same size as this column,
each row value is true if the corresponding entry in this column is contained in the
given searchSpace column and false if it is not. |
boolean |
contains(Scalar needle)
Find if the `needle` is present in this col
example:
Single Column:
idx 0 1 2 3 4
col = { 10, 20, 20, 30, 50 }
Scalar:
value = { 20 }
result = true
|
ColumnVector |
containsRe(RegexProgram regexProg)
Returns a boolean ColumnVector identifying rows which
match the given RegexProgram pattern starting at any location.
|
ColumnVector |
containsRe(String pattern)
Deprecated.
|
ColumnVector |
copyToColumnVector()
Creates a ColumnVector from a column view handle
|
HostColumnVector |
copyToHost()
Copy the data to host memory synchronously
|
HostColumnVector |
copyToHost(HostMemoryAllocator hostMemoryAllocator)
Copy the data to the host synchronously.
|
HostColumnVector |
copyToHostAsync(Cuda.Stream stream)
Copy the data to the host asynchronously.
|
HostColumnVector |
copyToHostAsync(Cuda.Stream stream,
HostMemoryAllocator hostMemoryAllocator)
Copy the data to the host asynchronously.
|
ColumnVector |
cos()
Calculate the cos, output is the same type as input.
|
ColumnVector |
cosh()
Calculate the hyperbolic cos, output is the same type as input.
|
ColumnVector |
countElements()
Get the number of elements for each list.
|
ColumnVector |
day()
Get day from a timestamp.
|
ColumnVector |
dayOfYear()
Get the day of the year from a timestamp.
|
int |
distinctCount()
Count how many rows in the column are distinct from one another.
|
int |
distinctCount(NullPolicy nullPolicy)
Count how many rows in the column are distinct from one another.
|
ColumnVector |
dropListDuplicates()
Create a new LIST column by copying elements from the current LIST column ignoring duplicate,
producing a LIST column in which each list contain only unique elements.
|
ColumnVector |
dropListDuplicatesWithKeysValues()
Given a LIST column in which each element is a struct containing a
|
ColumnVector |
endsWith(Scalar pattern)
Checks if each string in a column ends with a specified comparison string, resulting in a
parallel column of the boolean results.
|
ColumnVector |
exp()
Calculate the exp, output is the same type as input.
|
ColumnVector |
extractAllRecord(RegexProgram regexProg,
int idx)
Extracts all strings that match the given regex program pattern and corresponds to the
regular expression group index.
|
ColumnVector |
extractAllRecord(String pattern,
int idx)
Deprecated.
|
ColumnVector |
extractListElement(ColumnView indices)
For each list in this column pull out the entry at the corresponding index specified in
the index column.
|
ColumnVector |
extractListElement(int index)
For each list in this column pull out the entry at the given index.
|
Table |
extractRe(RegexProgram regexProg)
For each captured group specified in the given regex program
return a column in the table.
|
Table |
extractRe(String pattern)
Deprecated.
|
ColumnVector |
findAndReplaceAll(ColumnView oldValues,
ColumnView newValues)
Returns a vector with all values "oldValues[i]" replaced with "newValues[i]".
|
ColumnVector |
flattenLists()
Flatten each list of lists into a single list.
|
ColumnVector |
flattenLists(boolean ignoreNull)
Flatten each list of lists into a single list.
|
ColumnVector |
floor()
Calculate the floor, output is the same type as input.
|
static ColumnView |
fromDeviceBuffer(BaseDeviceMemoryBuffer buffer,
long startOffset,
DType type,
int rows)
Create a new column view from a raw device buffer.
|
ColumnVector |
generateListOffsets()
Generate list offsets from sizes of each list.
|
ColumnVector |
getByteCount()
Retrieve the number of bytes for each string.
|
ColumnVector |
getCharLengths()
Retrieve the number of characters in each string.
|
ColumnView |
getChildColumnView(int childIndex)
Returns the child column view at a given index.
|
ColumnView[] |
getChildColumnViews()
Returns the child column views for this view
Please note that it is the responsibility of the caller to close these views.
|
BaseDeviceMemoryBuffer |
getData()
Gets the data buffer for the current column view (viewHandle).
|
long |
getDeviceMemorySize()
Returns the amount of device memory used.
|
long |
getHostBytesRequired()
Calculate the total space required to copy the data to the host.
|
ColumnVector |
getJSONObject(Scalar path)
Apply a JSONPath string to all rows in an input strings column.
|
ColumnVector |
getJSONObject(Scalar path,
GetJsonObjectOptions options)
Apply a JSONPath string to all rows in an input strings column.
|
ColumnView |
getListOffsetsView()
Get a ColumnView that is the offsets for this list.
|
ColumnVector |
getMapKeyExistence(ColumnView keys)
For a column of type List
|
ColumnVector |
getMapKeyExistence(Scalar key)
For a column of type List
|
ColumnVector |
getMapValue(ColumnView keys)
Given a column of type List
|
ColumnVector |
getMapValue(Scalar key)
Given a column of type List
|
long |
getNativeView()
USE WITH CAUTION: This method exposes the address of the native cudf::column_view.
|
long |
getNullCount()
Returns the number of nulls in the data.
|
int |
getNumChildren() |
BaseDeviceMemoryBuffer |
getOffsets() |
long |
getRowCount()
Returns the number of rows in this vector.
|
Scalar |
getScalarElement(int index)
Get a single item from the column at the specified index as a Scalar.
|
DType |
getType()
Get the type of this data.
|
BaseDeviceMemoryBuffer |
getValid() |
boolean |
hasNonEmptyNulls()
Exact check if a column or its descendants have non-empty null rows
|
static long |
hostPaddingSizeInBytes()
Get the size that the host will align memory allocations to in bytes.
|
ColumnVector |
hour()
Get hour from a timestamp with time resolution.
|
ColumnVector |
ifElse(ColumnView trueValues,
ColumnView falseValues)
For a BOOL8 vector, computes a vector whose rows are selected from two other vectors
based on the boolean value of this vector in the corresponding row.
|
ColumnVector |
ifElse(ColumnView trueValues,
Scalar falseValue)
For a BOOL8 vector, computes a vector whose rows are selected from two other inputs
based on the boolean value of this vector in the corresponding row.
|
ColumnVector |
ifElse(Scalar trueValue,
ColumnView falseValues)
For a BOOL8 vector, computes a vector whose rows are selected from two other inputs
based on the boolean value of this vector in the corresponding row.
|
ColumnVector |
ifElse(Scalar trueValue,
Scalar falseValue)
For a BOOL8 vector, computes a vector whose rows are selected from two other inputs
based on the boolean value of this vector in the corresponding row.
|
ColumnVector |
isFixedPoint(DType decimalType)
Returns a Boolean vector with the same number of rows as this instance, that has
TRUE for any entry that is a fixed-point, and FALSE if its not a fixed-point.
|
ColumnVector |
isFloat()
Returns a Boolean vector with the same number of rows as this instance, that has
TRUE for any entry that is a float, and FALSE if its not a float.
|
ColumnVector |
isInteger()
Returns a Boolean vector with the same number of rows as this instance, that has
TRUE for any entry that is an integer, and FALSE if its not an integer.
|
ColumnVector |
isInteger(DType intType)
Returns a Boolean vector with the same number of rows as this instance, that has
TRUE for any entry that is an integer, and FALSE if its not an integer.
|
ColumnVector |
isLeapYear()
Check to see if the year for this timestamp is a leap year or not.
|
ColumnVector |
isNan()
Returns a Boolean vector with the same number of rows as this instance, that has
TRUE for any entry that is NaN, and FALSE if null or a valid floating point value
|
ColumnVector |
isNotNan()
Returns a Boolean vector with the same number of rows as this instance, that has
TRUE for any entry that is null or a valid floating point value, FALSE otherwise
|
ColumnVector |
isNotNull()
Returns a Boolean vector with the same number of rows as this instance, that has
TRUE for any entry that is not null, and FALSE for any null entry (as per the validity mask)
|
ColumnVector |
isNull()
Returns a Boolean vector with the same number of rows as this instance, that has
FALSE for any entry that is not null, and TRUE for any null entry (as per the validity mask)
|
ColumnVector |
isTimestamp(String format)
Verifies that a string column can be parsed to timestamps using the provided format
pattern.
|
ColumnVector |
joinStrings(Scalar separator,
Scalar narep)
Concatenates all strings in the column into one new string delimited
by an optional separator string.
|
ColumnVector |
lastDayOfMonth()
Get the date that is the last day of the month for this timestamp.
|
ColumnVector |
like(Scalar pattern,
Scalar escapeChar)
Returns a boolean ColumnVector identifying rows which
match the given like pattern.
|
ColumnVector |
listContains(Scalar key)
Create a column of bool values indicating whether the specified scalar
is an element of each row of a list column.
|
ColumnVector |
listContainsColumn(ColumnView key)
Create a column of bool values indicating whether the list rows of the first
column contain the corresponding values in the second column.
|
ColumnVector |
listContainsNulls()
Create a column of bool values indicating whether the list rows of the specified
column contain null elements.
|
ColumnVector |
listIndexOf(ColumnView keys,
ColumnView.FindOptions findOption)
Create a column of int32 indices, indicating the position of each row in the
search key column in the corresponding row of the lists column.
|
ColumnVector |
listIndexOf(Scalar key,
ColumnView.FindOptions findOption)
Create a column of int32 indices, indicating the position of the scalar search key
in each list row.
|
ColumnVector |
listReduce(SegmentedReductionAggregation aggregation)
Do a reduction on the values in a list.
|
ColumnVector |
listReduce(SegmentedReductionAggregation aggregation,
DType outType)
Do a reduction on the values in a list.
|
ColumnVector |
listReduce(SegmentedReductionAggregation aggregation,
NullPolicy nullPolicy,
DType outType)
Do a reduction on the values in a list.
|
static ColumnVector |
listsDifferenceDistinct(ColumnView lhs,
ColumnView rhs)
Find the difference of lists of the left column against lists of the right column.
|
static ColumnVector |
listsHaveOverlap(ColumnView lhs,
ColumnView rhs)
For each pair of lists from the input lists columns, check if they have any common non-null
elements.
|
static ColumnVector |
listsIntersectDistinct(ColumnView lhs,
ColumnView rhs)
Find the intersection without duplicate between lists at each row of the given lists columns.
|
ColumnVector |
listSortRows(boolean isDescending,
boolean isNullSmallest)
Segmented sort of the elements within a list in each row of a list column.
|
static ColumnVector |
listsUnionDistinct(ColumnView lhs,
ColumnView rhs)
Find the union without duplicate between lists at each row of the given lists columns.
|
ColumnVector |
log()
Calculate the log, output is the same type as input.
|
ColumnVector |
log10()
Calculate the log with base 10, output is the same type as input.
|
ColumnVector |
log2()
Calculate the log with base 2, output is the same type as input.
|
ColumnView |
logicalCastTo(DType type)
Deprecated.
this has changed to bit_cast in C++ so use that name instead
|
ColumnVector |
lower()
Convert a string to lower case.
|
ColumnVector |
lstrip()
Removes whitespace from the beginning of a string.
|
ColumnVector |
lstrip(Scalar toStrip)
Removes the specified characters from the beginning of each string.
|
static ColumnView |
makeStructView(ColumnView... columns)
Create a new struct column view of existing column views.
|
static ColumnView |
makeStructView(long rows,
ColumnView... columns)
Create a new struct column view of existing column views.
|
ColumnVector |
matchesRe(RegexProgram regexProg)
Returns a boolean ColumnVector identifying rows which
match the given regex program pattern but only at the beginning of the string.
|
ColumnVector |
matchesRe(String pattern)
Deprecated.
|
Scalar |
max()
Returns the maximum of all values in the column, returning a scalar
of the same type as this column.
|
Scalar |
max(DType outType)
Deprecated.
the max reduction no longer internally allows for setting the output type, as a
work around this API will cast the input type to the output type for you, but this may not
work in all cases.
|
Scalar |
mean()
Returns the arithmetic mean of all values in the column, returning a
FLOAT64 scalar unless the column type is FLOAT32 then a FLOAT32 scalar is returned.
|
Scalar |
mean(DType outType)
Returns the arithmetic mean of all values in the column, returning a
scalar of the specified type.
|
ColumnVector |
mergeAndSetValidity(BinaryOp mergeOp,
ColumnView... columns)
Create a deep copy of the column while replacing the null mask.
|
Scalar |
min()
Returns the minimum of all values in the column, returning a scalar
of the same type as this column.
|
Scalar |
min(DType outType)
Deprecated.
the min reduction no longer internally allows for setting the output type, as a
work around this API will cast the input type to the output type for you, but this may not
work in all cases.
|
ColumnVector |
minute()
Get minute from a timestamp with time resolution.
|
ColumnVector |
month()
Get month from a timestamp.
|
ColumnVector |
nansToNulls()
Returns a new ColumnVector with NaNs converted to nulls, preserving the existing null values.
|
ColumnVector |
normalizeNANsAndZeros()
Create a new vector of "normalized" values, where:
1.
|
ColumnVector |
not()
Returns a vector of the logical `not` of each value in the input
column (this)
|
ColumnVector |
pad(int width)
Pad the Strings column until it reaches the desired length with spaces " " on the right.
|
ColumnVector |
pad(int width,
PadSide side)
Pad the Strings column until it reaches the desired length with spaces " ".
|
ColumnVector |
pad(int width,
PadSide side,
String fillChar)
Pad the Strings column until it reaches the desired length.
|
ColumnVector |
prefixSum()
Compute the prefix sum (aka cumulative sum) of the values in this column.
|
Scalar |
product()
Returns the product of all values in the column, returning a scalar
of the same type as this column.
|
Scalar |
product(DType outType)
Returns the product of all values in the column, returning a scalar
of the specified type.
|
ColumnVector |
purgeNonEmptyNulls()
Copies this column into output while purging any non-empty null rows in the column or its
descendants.
|
ColumnVector |
quantile(QuantileMethod method,
double[] quantiles)
Calculate various quantiles of this ColumnVector.
|
ColumnVector |
quarterOfYear()
Get the quarter of the year from a timestamp.
|
Scalar |
reduce(ReductionAggregation aggregation)
Computes the reduction of the values in all rows of a column.
|
Scalar |
reduce(ReductionAggregation aggregation,
DType outType)
Computes the reduction of the values in all rows of a column.
|
ColumnVector |
repeatStrings(ColumnView repeatTimes)
Given a strings column, an output strings column is generated by repeating each of the input
string by a number of times given by the corresponding row in a
repeatTimes
numeric column. |
ColumnVector |
repeatStrings(int repeatTimes)
Given a strings column, each string in it is repeated a number of times specified by the
repeatTimes parameter. |
ColumnView |
replaceChildrenWithViews(int[] indices,
ColumnView[] views)
This method takes in a nested type and replaces its children with the given views
Note: Make sure the numbers of rows in the leaf node are the same as the child replacing it
otherwise the list can point to elements outside of the column values.
|
ColumnView |
replaceListChild(ColumnView child)
This method takes in a list and returns a new list with the leaf node replaced with the given
view.
|
ColumnVector |
replaceMultiRegex(String[] patterns,
ColumnView repls)
For each string, replaces any character sequence matching any of the regular expression
patterns with the corresponding replacement strings.
|
ColumnVector |
replaceNulls(ColumnView replacements)
Returns a ColumnVector with any null values replaced with the corresponding row in the
specified replacement column.
|
ColumnVector |
replaceNulls(ReplacePolicy policy) |
ColumnVector |
replaceNulls(Scalar scalar)
Returns a ColumnVector with any null values replaced with a scalar.
|
ColumnVector |
replaceRegex(RegexProgram regexProg,
Scalar repl)
For each string, replaces any character sequence matching the given regex program pattern
using the replacement string scalar.
|
ColumnVector |
replaceRegex(RegexProgram regexProg,
Scalar repl,
int maxRepl)
For each string, replaces any character sequence matching the given regex program pattern
using the replacement string scalar.
|
ColumnVector |
replaceRegex(String pattern,
Scalar repl)
Deprecated.
|
ColumnVector |
replaceRegex(String pattern,
Scalar repl,
int maxRepl)
Deprecated.
|
ColumnVector |
reverseStringsOrLists()
Copy the current column to a new column, each string or list of the output column will have
reverse order of characters or elements.
|
ColumnVector |
rint()
Rounds a floating-point argument to the closest integer value, but returns it as a float.
|
ColumnVector |
rollingWindow(RollingAggregation op,
WindowOptions options)
This function aggregates values in a window around each element i of the input
column.
|
ColumnVector |
round()
Rounds all the values in a column with these default values:
decimalPlaces = 0
Rounding method = RoundMode.HALF_UP
|
ColumnVector |
round(int decimalPlaces)
Rounds all the values in a column to the specified number of decimal places with HALF_UP
(default) as Rounding method.
|
ColumnVector |
round(int decimalPlaces,
RoundMode mode)
Rounds all the values in a column to the specified number of decimal places.
|
ColumnVector |
round(RoundMode round)
Rounds all the values in a column with decimal places = 0.
|
ColumnVector |
rstrip()
Removes whitespace from the end of a string.
|
ColumnVector |
rstrip(Scalar toStrip)
Removes the specified characters from the end of each string.
|
ColumnVector |
scan(ScanAggregation aggregation)
Computes an inclusive scan for a column that excludes nulls.
|
ColumnVector |
scan(ScanAggregation aggregation,
ScanType scanType)
Computes a scan for a column that excludes nulls.
|
ColumnVector |
scan(ScanAggregation aggregation,
ScanType scanType,
NullPolicy nullPolicy)
Computes a scan for a column.
|
ColumnVector |
second()
Get second from a timestamp with time resolution.
|
ColumnVector |
segmentedGather(ColumnView gatherMap)
Segmented gather of the elements within a list element in each row of a list column.
|
ColumnVector |
segmentedGather(ColumnView gatherMap,
OutOfBoundsPolicy policy)
Segmented gather of the elements within a list element in each row of a list column.
|
ColumnVector |
segmentedReduce(ColumnView offsets,
SegmentedReductionAggregation aggregation)
Do a segmented reduce where the offsets column indicates which groups in this to combine.
|
ColumnVector |
segmentedReduce(ColumnView offsets,
SegmentedReductionAggregation aggregation,
DType outType)
Do a segmented reduce where the offsets column indicates which groups in this to combine.
|
ColumnVector |
segmentedReduce(ColumnView offsets,
SegmentedReductionAggregation aggregation,
NullPolicy nullPolicy,
DType outType)
Do a segmented reduce where the offsets column indicates which groups in this to combine.
|
ColumnVector |
sin()
Calculate the sin, output is the same type as input.
|
ColumnVector |
sinh()
Calculate the hyperbolic sin, output is the same type as input.
|
ColumnVector[] |
slice(int... indices)
Slices a column (including null values) into a set of columns
according to a set of indices.
|
ColumnVector[] |
split(int... indices)
Splits a column (including null values) into a set of columns
according to a set of indices.
|
ColumnView[] |
splitAsViews(int... indices)
Splits a ColumnView (including null values) into a set of ColumnViews
according to a set of indices.
|
ColumnVector |
sqrt()
Calculate the sqrt, output is the same type as input.
|
Scalar |
standardDeviation()
Returns the sample standard deviation of all values in the column,
returning a FLOAT64 scalar unless the column type is FLOAT32 then
a FLOAT32 scalar is returned.
|
Scalar |
standardDeviation(DType outType)
Returns the sample standard deviation of all values in the column,
returning a scalar of the specified type.
|
ColumnVector |
startsWith(Scalar pattern)
Checks if each string in a column starts with a specified comparison string, resulting in a
parallel column of the boolean results.
|
ColumnVector |
stringConcatenateListElements(ColumnView sepCol)
Given a lists column of strings (each row is a list of strings), concatenates the strings
within each row and returns a single strings column result.
|
ColumnVector |
stringConcatenateListElements(ColumnView sepCol,
Scalar separatorNarep,
Scalar stringNarep,
boolean separateNulls,
boolean emptyStringOutputIfEmptyList)
Given a lists column of strings (each row is a list of strings), concatenates the strings
within each row and returns a single strings column result.
|
ColumnVector |
stringConcatenateListElements(Scalar separator,
Scalar narep,
boolean separateNulls,
boolean emptyStringOutputIfEmptyList)
Given a lists column of strings (each row is a list of strings), concatenates the strings
within each row and returns a single strings column result.
|
ColumnVector |
stringContains(Scalar compString)
Checks if each string in a column contains a specified comparison string, resulting in a
parallel column of the boolean results.
|
ColumnVector |
stringLocate(Scalar substring)
Locates the starting index of the first instance of the given string in each row of a column.
|
ColumnVector |
stringLocate(Scalar substring,
int start)
Locates the starting index of the first instance of the given string in each row of a column.
|
ColumnVector |
stringLocate(Scalar substring,
int start,
int end)
Locates the starting index of the first instance of the given string in each row of a column.
|
ColumnVector |
stringReplace(ColumnView targets,
ColumnView repls)
Returns a new strings column where target strings with each string are replaced with
corresponding replacement strings.
|
ColumnVector |
stringReplace(Scalar target,
Scalar replace)
Returns a new strings column where target string within each string is replaced with the specified
replacement string.
|
ColumnVector |
stringReplaceWithBackrefs(RegexProgram regexProg,
String replace)
For each string, replaces any character sequence matching the given regex program
pattern using the replace template for back-references.
|
ColumnVector |
stringReplaceWithBackrefs(String pattern,
String replace)
Deprecated.
|
Table |
stringSplit(RegexProgram regexProg)
Returns a list of columns by splitting each string using the specified regex program pattern.
|
Table |
stringSplit(RegexProgram regexProg,
int limit)
Returns a list of columns by splitting each string using the specified regex program pattern.
|
Table |
stringSplit(String delimiter)
Returns a list of columns by splitting each string using the specified string literal
delimiter.
|
Table |
stringSplit(String pattern,
boolean splitByRegex)
Deprecated.
|
Table |
stringSplit(String delimiter,
int limit)
Returns a list of columns by splitting each string using the specified string literal
delimiter.
|
Table |
stringSplit(String pattern,
int limit,
boolean splitByRegex)
Deprecated.
|
ColumnVector |
stringSplitRecord(RegexProgram regexProg)
Returns a column that are lists of strings in which each list is made by splitting the
corresponding input string using the specified regex program pattern.
|
ColumnVector |
stringSplitRecord(RegexProgram regexProg,
int limit)
Returns a column that are lists of strings in which each list is made by splitting the
corresponding input string using the specified regex program pattern.
|
ColumnVector |
stringSplitRecord(String delimiter)
Returns a column that are lists of strings in which each list is made by splitting the
corresponding input string using the specified string literal delimiter.
|
ColumnVector |
stringSplitRecord(String pattern,
boolean splitByRegex)
Deprecated.
|
ColumnVector |
stringSplitRecord(String delimiter,
int limit)
Returns a column that are lists of strings in which each list is made by splitting the
corresponding input string using the specified string literal delimiter.
|
ColumnVector |
stringSplitRecord(String pattern,
int limit,
boolean splitByRegex)
Deprecated.
|
ColumnVector |
strip()
Removes whitespace from the beginning and end of a string.
|
ColumnVector |
strip(Scalar toStrip)
Removes the specified characters from the beginning and end of each string.
|
ColumnVector |
substring(ColumnView start,
ColumnView end)
Returns a new strings column that contains substrings of the strings in the provided column
which uses unique ranges for each string
|
ColumnVector |
substring(int start)
Returns a new strings column that contains substrings of the strings in the provided column.
|
ColumnVector |
substring(int start,
int end)
Returns a new strings column that contains substrings of the strings in the provided column.
|
ColumnVector |
subVector(int start)
Return a subVector from start inclusive to the end of the vector.
|
ColumnVector |
subVector(int start,
int end)
Return a subVector.
|
Scalar |
sum()
Computes the sum of all values in the column, returning a scalar
of the same type as this column.
|
Scalar |
sum(DType outType)
Computes the sum of all values in the column, returning a scalar
of the specified type.
|
Scalar |
sumOfSquares()
Returns the sum of squares of all values in the column, returning a
scalar of the same type as this column.
|
Scalar |
sumOfSquares(DType outType)
Returns the sum of squares of all values in the column, returning a
scalar of the specified type.
|
ColumnVector |
tan()
Calculate the tan, output is the same type as input.
|
ColumnVector |
tanh()
Calculate the hyperbolic tan, output is the same type as input.
|
protected static long |
title(long handle) |
ColumnVector |
toHex()
Convert this integer column to hexadecimal column and return a new strings column
Any null entries will result in corresponding null entries in the output column.
|
String |
toString() |
ColumnVector |
toTitle()
Returns a column of strings where, for each string row in the input,
the first character after spaces is modified to upper-case,
while all the remaining characters in a word are modified to lower-case.
|
ColumnVector |
transform(String udf,
boolean isPtx)
Transform a vector using a custom function.
|
ColumnVector |
unaryOp(UnaryOp op)
Multiple different unary operations.
|
ColumnVector |
upper()
Convert a string to upper case.
|
ColumnVector |
urlDecode()
Converts all character sequences starting with '%' into character code-points
interpreting the 2 following characters as hex values to create the code-point.
|
ColumnVector |
urlEncode()
Converts mostly non-ascii characters and control characters into UTF-8 hex code-points
prefixed with '%'.
|
Scalar |
variance()
Returns the variance of all values in the column, returning a
FLOAT64 scalar unless the column type is FLOAT32 then a FLOAT32 scalar is returned.
|
Scalar |
variance(DType outType)
Returns the variance of all values in the column, returning a
scalar of the specified type.
|
ColumnVector |
weekDay()
Get the day of the week from a timestamp.
|
ColumnVector |
year()
Get year from a timestamp.
|
ColumnVector |
zfill(int width)
Add '0' as padding to the left of each string.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
add, add, and, and, arctan2, arctan2, bitAnd, bitAnd, bitOr, bitOr, bitXor, bitXor, div, div, equalTo, equalTo, equalToNullAware, equalToNullAware, floorDiv, floorDiv, greaterOrEqualTo, greaterOrEqualTo, greaterThan, greaterThan, implicitConversion, lessOrEqualTo, lessOrEqualTo, lessThan, lessThan, log, log, maxNullAware, maxNullAware, minNullAware, minNullAware, mod, mod, mul, mul, notEqualTo, notEqualTo, notEqualToNullAware, notEqualToNullAware, or, or, pmod, pmod, pow, pow, shiftLeft, shiftLeft, shiftRight, shiftRight, shiftRightUnsigned, shiftRightUnsigned, sub, sub, trueDiv, trueDiv
public static final long UNKNOWN_NULL_COUNT
protected long viewHandle
protected final DType type
protected final long rows
protected final long nullCount
protected final ColumnVector.OffHeapState offHeap
protected ColumnView(ColumnVector.OffHeapState state)
state
- the state this view is based off of.AssertionError
- if offHeapState points to a nested-type view with non-empty nullspublic ColumnView(DType type, long rows, Optional<Long> nullCount, BaseDeviceMemoryBuffer validityBuffer, BaseDeviceMemoryBuffer offsetBuffer, ColumnView[] children)
copyToColumnVector()
type
- the type of the vectorrows
- the number of rows in this vector.nullCount
- the number of nulls in the dataset.validityBuffer
- an optional validity buffer. Must be provided if nullCount != 0.
The ownership doesn't change on this bufferoffsetBuffer
- a host buffer required for nested types including strings and string
categories. The ownership doesn't change on this bufferchildren
- an array of ColumnView childrenpublic ColumnView(DType type, long rows, Optional<Long> nullCount, BaseDeviceMemoryBuffer dataBuffer, BaseDeviceMemoryBuffer validityBuffer)
copyToColumnVector()
type
- the type of the vectorrows
- the number of rows in this vector.nullCount
- the number of nulls in the dataset.dataBuffer
- a host buffer required for nested types including strings and string
categories. The ownership doesn't change on this buffervalidityBuffer
- an optional validity buffer. Must be provided if nullCount != 0.
The ownership doesn't change on this bufferpublic ColumnView(DType type, long rows, Optional<Long> nullCount, BaseDeviceMemoryBuffer dataBuffer, BaseDeviceMemoryBuffer validityBuffer, BaseDeviceMemoryBuffer offsetBuffer)
copyToColumnVector()
type
- the type of the vectorrows
- the number of rows in this vector.nullCount
- the number of nulls in the dataset.dataBuffer
- a host buffer required for nested types including strings and string
categories. The ownership doesn't change on this buffervalidityBuffer
- an optional validity buffer. Must be provided if nullCount != 0.
The ownership doesn't change on this bufferoffsetBuffer
- The offsetbuffer for columns that need an offset bufferpublic ColumnVector copyToColumnVector()
public final long getNativeView()
public final DType getType()
BinaryOperable
getType
in interface BinaryOperable
public final ColumnView[] getChildColumnViews()
public final ColumnView getChildColumnView(int childIndex)
childIndex
- the index of the childpublic ColumnView getListOffsetsView()
public final BaseDeviceMemoryBuffer getData()
public final BaseDeviceMemoryBuffer getOffsets()
public final BaseDeviceMemoryBuffer getValid()
public long getNullCount()
public final long getRowCount()
public final int getNumChildren()
public long getDeviceMemorySize()
public void close()
close
in interface AutoCloseable
public final ColumnVector nansToNulls()
public final ColumnVector getCharLengths()
public final ColumnVector getByteCount()
public final ColumnVector codePoints()
public final ColumnVector countElements()
public final ColumnVector isNotNull()
public final ColumnVector isNull()
public final ColumnVector isFixedPoint(DType decimalType)
decimalType
- the data type that should be used for bounds checking. Note that only
Decimal types (fixed-point) are allowed.public final ColumnVector isInteger()
public final ColumnVector isInteger(DType intType)
intType
- the data type that should be used for bounds checking. Note that only
cudf integer types are allowed including signed/unsigned int8 through int64public final ColumnVector isFloat()
public final ColumnVector isNan()
public final ColumnVector isNotNan()
public final ColumnVector findAndReplaceAll(ColumnView oldValues, ColumnView newValues)
oldValues
- - A vector containing values that should be replacednewValues
- - A vector containing new valuespublic final ColumnVector replaceNulls(Scalar scalar)
scalar
- - Scalar value to use as replacementpublic final ColumnVector replaceNulls(ColumnView replacements)
replacements
- column of replacement valuespublic final ColumnVector replaceNulls(ReplacePolicy policy)
public final ColumnVector ifElse(ColumnView trueValues, ColumnView falseValues)
trueValues
- the values to select if a row in this column is truefalseValues
- the values to select if a row in this column is not truepublic final ColumnVector ifElse(ColumnView trueValues, Scalar falseValue)
trueValues
- the values to select if a row in this column is truefalseValue
- the value to select if a row in this column is not truepublic final ColumnVector ifElse(Scalar trueValue, ColumnView falseValues)
trueValue
- the value to select if a row in this column is truefalseValues
- the values to select if a row in this column is not truepublic final ColumnVector ifElse(Scalar trueValue, Scalar falseValue)
trueValue
- the value to select if a row in this column is truefalseValue
- the value to select if a row in this column is not truepublic final ColumnVector[] slice(int... indices)
indices
- public final ColumnVector subVector(int start)
start
- the index to start at.public final ColumnVector subVector(int start, int end)
start
- the index to start at (inclusive).end
- the index to end at (exclusive).public final ColumnVector[] split(int... indices)
indices
- the indexes to split withpublic ColumnView[] splitAsViews(int... indices)
indices
- the indices to split withpublic final ColumnVector normalizeNANsAndZeros()
Double.longBitsToDouble(long)
describes how equivalent values of NaN/-NaN might have different bitwise representations.
This method may be used to compare different bitwise values of 0.0 or NaN as logically
equivalent. For instance, if these values appear in a groupby key column, without normalization
0.0 and -0.0 would be erroneously treated as distinct groups, as will each representation of NaN.public final ColumnVector mergeAndSetValidity(BinaryOp mergeOp, ColumnView... columns)
mergeOp
- binary operator (BITWISE_AND and BITWISE_OR only)columns
- array of columns whose null masks are merged, must have identical number of rows.public final ColumnVector year()
Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
public final ColumnVector month()
Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
public final ColumnVector day()
Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
public final ColumnVector hour()
Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
public final ColumnVector minute()
Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
public final ColumnVector second()
Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
public final ColumnVector weekDay()
Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
public final ColumnVector lastDayOfMonth()
Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
public final ColumnVector dayOfYear()
Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
public final ColumnVector quarterOfYear()
public final ColumnVector addCalendricalMonths(ColumnView months)
months
- must be a INT16 column indicating the number of months to add. A negative number
of months works too.public final ColumnVector isLeapYear()
public ColumnVector round(int decimalPlaces, RoundMode mode)
decimalPlaces
- Number of decimal places to round to. If negative, this
specifies the number of positions to the left of the decimal point.mode
- Rounding method(either HALF_UP or HALF_EVEN)public ColumnVector round(RoundMode round)
round
- Rounding method(either HALF_UP or HALF_EVEN)public ColumnVector round(int decimalPlaces)
decimalPlaces
- Number of decimal places to round to. If negative, this
specifies the number of positions to the left of the decimal point.public ColumnVector round()
public final ColumnVector transform(String udf, boolean isPtx)
udf
- This function will be applied to every element in the vectorisPtx
- is the code of the function ptx? true or C/C++ false.public final ColumnVector unaryOp(UnaryOp op)
op
- the operation to performpublic final ColumnVector sin()
public final ColumnVector cos()
public final ColumnVector tan()
public final ColumnVector arcsin()
public final ColumnVector arccos()
public final ColumnVector arctan()
public final ColumnVector sinh()
public final ColumnVector cosh()
public final ColumnVector tanh()
public final ColumnVector arcsinh()
public final ColumnVector arccosh()
public final ColumnVector arctanh()
public final ColumnVector exp()
public final ColumnVector log()
public final ColumnVector log2()
public final ColumnVector log10()
public final ColumnVector sqrt()
public final ColumnVector cbrt()
public final ColumnVector ceil()
public final ColumnVector floor()
public final ColumnVector abs()
public final ColumnVector rint()
public final ColumnVector bitInvert()
public final ColumnVector binaryOp(BinaryOp op, BinaryOperable rhs, DType outType)
binaryOp
in interface BinaryOperable
op
- the operation to performrhs
- the rhs of the operationoutType
- the type of output you want.public Scalar sum()
public Scalar sum(DType outType)
public Scalar min()
@Deprecated public Scalar min(DType outType)
public Scalar max()
@Deprecated public Scalar max(DType outType)
public Scalar product()
public Scalar product(DType outType)
public Scalar sumOfSquares()
public Scalar sumOfSquares(DType outType)
public Scalar mean()
public Scalar mean(DType outType)
outType
- the output type to return. Note that only floating point
types are currently supported.public Scalar variance()
public Scalar variance(DType outType)
outType
- the output type to return. Note that only floating point
types are currently supported.public Scalar standardDeviation()
public Scalar standardDeviation(DType outType)
outType
- the output type to return. Note that only floating point
types are currently supported.public Scalar any()
public Scalar any(DType outType)
public Scalar all()
@Deprecated public Scalar all(DType outType)
public Scalar reduce(ReductionAggregation aggregation)
aggregation
- The reduction aggregation to performScalar.isValid()
method of the result will return false.public Scalar reduce(ReductionAggregation aggregation, DType outType)
aggregation
- The reduction aggregation to performoutType
- The type of scalar value to return. Not all output types are supported
by all aggregation operations.Scalar.isValid()
method of the result will return false.public ColumnVector segmentedReduce(ColumnView offsets, SegmentedReductionAggregation aggregation)
offsets
- an INT32 column with no nulls.aggregation
- the aggregation to dopublic ColumnVector segmentedReduce(ColumnView offsets, SegmentedReductionAggregation aggregation, DType outType)
offsets
- an INT32 column with no nulls.aggregation
- the aggregation to dooutType
- the output data type.public ColumnVector segmentedReduce(ColumnView offsets, SegmentedReductionAggregation aggregation, NullPolicy nullPolicy, DType outType)
offsets
- an INT32 column with no nulls.aggregation
- the aggregation to donullPolicy
- the null policy.outType
- the output data type.public ColumnVector segmentedGather(ColumnView gatherMap)
gatherMap
- ListColumnView carrying lists of integral indices which maps the
element in list of each row in the source columns to rows of lists in the result columns.public ColumnVector segmentedGather(ColumnView gatherMap, OutOfBoundsPolicy policy)
gatherMap
- ListColumnView carrying lists of integral indices which maps the
element in list of each row in the source columns to rows of lists in the result columns.policy
- OutOfBoundsPolicy, `DONT_CHECK` leads to undefined behaviour; `NULLIFY`
replaces out of bounds with null.public ColumnVector listReduce(SegmentedReductionAggregation aggregation)
aggregation
- the aggregation to performpublic ColumnVector listReduce(SegmentedReductionAggregation aggregation, DType outType)
aggregation
- the aggregation to performoutType
- the type of the output. Typically, this should match with the child type
of the list.public ColumnVector listReduce(SegmentedReductionAggregation aggregation, NullPolicy nullPolicy, DType outType)
aggregation
- the aggregation to performnullPolicy
- should nulls be included or excluded from the aggregation.outType
- the type of the output. Typically, this should match with the child type
of the list.public final ColumnVector approxPercentile(double[] percentiles)
percentiles
- Required percentiles [0,1]public final ColumnVector approxPercentile(ColumnVector percentiles)
percentiles
- Column containing percentiles [0,1]public final ColumnVector quantile(QuantileMethod method, double[] quantiles)
method
- the method used to calculate the quantilesquantiles
- the quantile values [0,1]public final ColumnVector rollingWindow(RollingAggregation op, WindowOptions options)
op
- the operation to perform.options
- various window function arguments.IllegalArgumentException
- if unsupported window specification * (i.e. other than WindowOptions.FrameType.ROWS
is used.public final ColumnVector prefixSum()
public final ColumnVector scan(ScanAggregation aggregation, ScanType scanType, NullPolicy nullPolicy)
aggregation
- the aggregation to performscanType
- should the scan be inclusive, include the current row, or exclusive.nullPolicy
- how should nulls be treated. Note that some aggregations also include a
null policy too. Currently none of those aggregations are supported so
it is undefined how they would interact with each other.public final ColumnVector scan(ScanAggregation aggregation, ScanType scanType)
aggregation
- the aggregation to performscanType
- should the scan be inclusive, include the current row, or exclusive.public final ColumnVector scan(ScanAggregation aggregation)
aggregation
- the aggregation to performpublic final ColumnVector not()
public boolean contains(Scalar needle)
needle
- public final ColumnVector contains(ColumnView searchSpace)
DType.BOOL8
elements having the same size as this column,
each row value is true if the corresponding entry in this column is contained in the
given searchSpace column and false if it is not.
The caller will be responsible for the lifecycle of the new vector.
example:
col = { 10, 20, 30, 40, 50 }
searchSpace = { 20, 40, 60, 80 }
result = { false, true, false, true, false }searchSpace
- DType.BOOL8
public final ColumnVector toTitle()
public final ColumnVector capitalize(Scalar delimiters)
delimiters
- Used if identifying words to capitalize. Should not be null.public final ColumnVector joinStrings(Scalar separator, Scalar narep)
separator
- what to insert to separate each row.narep
- what to replace nulls withpublic ColumnVector castTo(DType type)
asTimestamp(DType, String)
and asStrings(String)
for casting string to timestamp when the format
is known
Float values when converted to String could be different from the expected default behavior in
Java
e.g.
12.3 => "12.30000019" instead of "12.3"
Double.POSITIVE_INFINITY => "Inf" instead of "INFINITY"
Double.NEGATIVE_INFINITY => "-Inf" instead of "-INFINITY"type
- type of the resulting ColumnVectorpublic ColumnView replaceChildrenWithViews(int[] indices, ColumnView[] views)
public ColumnView replaceListChild(ColumnView child)
@Deprecated public ColumnView logicalCastTo(DType type)
type
- the type you want to go to.public ColumnView bitCastTo(DType type)
type
- the type you want to go to.public final ColumnVector asBytes()
public final ColumnVector asByteList()
public final ColumnVector asByteList(boolean config)
config
- Flips the byte order (endianness) if true, retains byte order otherwisepublic final ColumnVector asUnsignedBytes()
Java does not have an unsigned byte type, so properly decoding these values
will require extra steps on the part of the application. See
Byte.toUnsignedInt(byte)
.
public final ColumnVector asShorts()
public final ColumnVector asUnsignedShorts()
Java does not have an unsigned short type, so properly decoding these values
will require extra steps on the part of the application. See
Short.toUnsignedInt(short)
.
public final ColumnVector asInts()
public final ColumnVector asUnsignedInts()
Java does not have an unsigned int type, so properly decoding these values
will require extra steps on the part of the application. See
Integer.toUnsignedLong(int)
.
public final ColumnVector asLongs()
public final ColumnVector asUnsignedLongs()
Java does not have an unsigned long type, so properly decoding these values
will require extra steps on the part of the application. See
Long.toUnsignedString(long)
.
public final ColumnVector asFloats()
public final ColumnVector asDoubles()
public final ColumnVector asTimestampDays()
public final ColumnVector asTimestampDays(String format)
format
- timestamp string format specifier, ignored if the column type is not stringpublic final ColumnVector asTimestampSeconds()
public final ColumnVector asTimestampSeconds(String format)
format
- timestamp string format specifier, ignored if the column type is not stringpublic final ColumnVector asTimestampMicroseconds()
public final ColumnVector asTimestampMicroseconds(String format)
format
- timestamp string format specifier, ignored if the column type is not stringpublic final ColumnVector asTimestampMilliseconds()
public final ColumnVector asTimestampMilliseconds(String format)
format
- timestamp string format specifier, ignored if the column type is not stringpublic final ColumnVector asTimestampNanoseconds()
public final ColumnVector asTimestampNanoseconds(String format)
format
- timestamp string format specifier, ignored if the column type is not stringpublic final ColumnVector asTimestamp(DType timestampType, String format)
timestampType
- timestamp DType that includes the time unit to parse the timestamp into.format
- strptime format specifier string of the timestamp. Used to parse and convert
the timestamp with. Supports %Y,%y,%m,%d,%H,%I,%p,%M,%S,%f,%z format specifiers.
See https://github.com/rapidsai/custrings/blob/branch-0.10/docs/source/datetime.md
for full parsing format specification and documentation.public final ColumnVector asStrings()
DType.TIMESTAMP_DAYS
- "%Y-%m-%d"
DType.TIMESTAMP_SECONDS
- "%Y-%m-%d %H:%M:%S"
DType.TIMESTAMP_MICROSECONDS
- "%Y-%m-%d %H:%M:%S.%f"
DType.TIMESTAMP_MILLISECONDS
- "%Y-%m-%d %H:%M:%S.%f"
DType.TIMESTAMP_NANOSECONDS
- "%Y-%m-%d %H:%M:%S.%f"public final ColumnVector asStrings(String format)
format
- - strftime format specifier string of the timestamp. Its used to parse and convert
the timestamp with. Supports %m,%j,%d,%H,%M,%S,%y,%Y,%f format specifiers.
%d Day of the month: 01-31
%m Month of the year: 01-12
%y Year without century: 00-99c
%Y Year with century: 0001-9999
%H 24-hour of the day: 00-23
%M Minute of the hour: 00-59
%S Second of the minute: 00-59
%f 6-digit microsecond: 000000-999999
See https://github.com/rapidsai/custrings/blob/branch-0.10/docs/source/datetime.md
Reported bugs
https://github.com/rapidsai/cudf/issues/4160 after the bug is fixed this method should
also support
%I 12-hour of the day: 01-12
%p Only 'AM', 'PM'
%j day of the yearpublic final ColumnVector isTimestamp(String format)
format
- String specifying the timestamp format in strings.public final ColumnVector extractListElement(int index)
index
- 0 based offset into the list. Negative values go backwards from the end of the
list.public final ColumnVector extractListElement(ColumnView indices)
indices
- a column of 0 based offsets into the list. Negative values go backwards from
the end of the list.public final ColumnVector dropListDuplicates()
public final ColumnVector dropListDuplicatesWithKeysValues()
public ColumnVector flattenLists()
public ColumnVector flattenLists(boolean ignoreNull)
ignoreNull
- Whether to ignore null list elements in the input column from the operation,
or any row containing null list elements will result in a null output rowpublic final ColumnVector reverseStringsOrLists()
public final ColumnVector upper()
public final ColumnVector lower()
public final ColumnVector stringLocate(Scalar substring)
substring
- scalar containing the string to locate within each row.public final ColumnVector stringLocate(Scalar substring, int start)
substring
- scalar containing the string to locate within each row.start
- character index to start the search from (inclusive).public final ColumnVector stringLocate(Scalar substring, int start, int end)
substring
- scalar containing the string scalar to locate within each row.start
- character index to start the search from (inclusive).end
- character index to end the search on (exclusive).@Deprecated public final Table stringSplit(String pattern, int limit, boolean splitByRegex)
pattern
- UTF-8 encoded string identifying the split pattern for each input string.limit
- the maximum size of the list resulting from splitting each input string,
or -1 for all possible splits. Note that limit = 0 (all possible splits without
trailing empty strings) and limit = 1 (no split at all) are not supported.splitByRegex
- a boolean flag indicating whether the input strings will be split by a
regular expression pattern or just by a string literal delimiter.public final Table stringSplit(RegexProgram regexProg, int limit)
regexProg
- the regex program with UTF-8 encoded string identifying the split pattern
for each input string.limit
- the maximum size of the list resulting from splitting each input string,
or -1 for all possible splits. Note that limit = 0 (all possible splits without
trailing empty strings) and limit = 1 (no split at all) are not supported.@Deprecated public final Table stringSplit(String pattern, boolean splitByRegex)
pattern
- UTF-8 encoded string identifying the split pattern for each input string.splitByRegex
- a boolean flag indicating whether the input strings will be split by a
regular expression pattern or just by a string literal delimiter.public final Table stringSplit(String delimiter, int limit)
delimiter
- UTF-8 encoded string identifying the split delimiter for each input string.limit
- the maximum size of the list resulting from splitting each input string,
or -1 for all possible splits. Note that limit = 0 (all possible splits without
trailing empty strings) and limit = 1 (no split at all) are not supported.public final Table stringSplit(String delimiter)
delimiter
- UTF-8 encoded string identifying the split delimiter for each input string.public final Table stringSplit(RegexProgram regexProg)
regexProg
- the regex program with UTF-8 encoded string identifying the split pattern
for each input string.@Deprecated public final ColumnVector stringSplitRecord(String pattern, int limit, boolean splitByRegex)
pattern
- UTF-8 encoded string identifying the split pattern for each input string.limit
- the maximum size of the list resulting from splitting each input string,
or -1 for all possible splits. Note that limit = 0 (all possible splits without
trailing empty strings) and limit = 1 (no split at all) are not supported.splitByRegex
- a boolean flag indicating whether the input strings will be split by a
regular expression pattern or just by a string literal delimiter.public final ColumnVector stringSplitRecord(RegexProgram regexProg, int limit)
regexProg
- the regex program with UTF-8 encoded string identifying the split pattern
for each input string.limit
- the maximum size of the list resulting from splitting each input string,
or -1 for all possible splits. Note that limit = 0 (all possible splits without
trailing empty strings) and limit = 1 (no split at all) are not supported.@Deprecated public final ColumnVector stringSplitRecord(String pattern, boolean splitByRegex)
pattern
- UTF-8 encoded string identifying the split pattern for each input string.splitByRegex
- a boolean flag indicating whether the input strings will be split by a
regular expression pattern or just by a string literal delimiter.public final ColumnVector stringSplitRecord(String delimiter, int limit)
delimiter
- UTF-8 encoded string identifying the split delimiter for each input string.limit
- the maximum size of the list resulting from splitting each input string,
or -1 for all possible splits. Note that limit = 0 (all possible splits without
trailing empty strings) and limit = 1 (no split at all) are not supported.public final ColumnVector stringSplitRecord(String delimiter)
delimiter
- UTF-8 encoded string identifying the split delimiter for each input string.public final ColumnVector stringSplitRecord(RegexProgram regexProg)
regexProg
- the regex program with UTF-8 encoded string identifying the split pattern
for each input string.public final ColumnVector substring(int start)
start
- first character index to begin the substring(inclusive).public final ColumnVector substring(int start, int end)
start
- first character index to begin the substring(inclusive).end
- last character index to stop the substring(exclusive)public final ColumnVector substring(ColumnView start, ColumnView end)
start
- Vector containing start indices of each stringend
- Vector containing end indices of each string. -1 indicated to read until end of string.public final ColumnVector stringConcatenateListElements(ColumnView sepCol)
sepCol
- strings column that provides separators for concatenation.public final ColumnVector stringConcatenateListElements(ColumnView sepCol, Scalar separatorNarep, Scalar stringNarep, boolean separateNulls, boolean emptyStringOutputIfEmptyList)
sepCol
- strings column that provides separators for concatenation.separatorNarep
- string scalar indicating null behavior when a separator is null.
If set to null and the separator is null the resulting string will
be null. If not null, this string will be used in place of a null
separator.stringNarep
- string that should be used to replace null strings in any non-null list
row. If set to null and the string is null the resulting string will
be null. If not null, this string will be used in place of a null value.separateNulls
- if true, then the separator is included for null rows if
`stringNarep` is valid.emptyStringOutputIfEmptyList
- if set to true, any input row that is an empty list
will result in an empty string. Otherwise, it will result in a null.public final ColumnVector stringConcatenateListElements(Scalar separator, Scalar narep, boolean separateNulls, boolean emptyStringOutputIfEmptyList)
separator
- string scalar inserted between each string being merged.narep
- string scalar indicating null behavior. If set to null and any string in the row
is null the resulting string will be null. If not null, null values in any
column will be replaced by the specified string. The underlying value in the
string scalar may be null, but the object passed in may not.separateNulls
- if true, then the separator is included for null rows if
`narep` is valid.emptyStringOutputIfEmptyList
- if set to true, any input row that is an empty list
will result in an empty string. Otherwise, it will result in a null.public final ColumnVector repeatStrings(int repeatTimes)
repeatTimes
parameter.
In special cases:
- If repeatTimes
is not a positive number, a non-null input string will always
result in an empty output string.
- A null input string will always result in a null output string regardless of the value of
the repeatTimes
parameter.repeatTimes
- The number of times each input string is repeated.public final ColumnVector repeatStrings(ColumnView repeatTimes)
repeatTimes
numeric column.
In special cases:
- Any null row (from either the input strings column or the repeatTimes
column)
will always result in a null output string.
- If any value in the repeatTimes
column is not a positive number and its
corresponding input string is not null, the output string will be an empty string.repeatTimes
- The column containing numbers of times each input string is repeated.public final ColumnVector getJSONObject(Scalar path, GetJsonObjectOptions options)
path
- The JSONPath string to be applied to each rowpath
- The GetJsonObjectOptions to control get_json_object behaviourpublic final ColumnVector getJSONObject(Scalar path)
path
- The JSONPath string to be applied to each rowpublic final ColumnVector stringReplace(Scalar target, Scalar replace)
target
- String to search for within each string.replace
- Replacement string if target is found.public final ColumnVector stringReplace(ColumnView targets, ColumnView repls)
targets
- Strings to search for in each string.repls
- Corresponding replacement strings for target strings.@Deprecated public final ColumnVector replaceRegex(String pattern, Scalar repl)
pattern
- The regular expression pattern to search within each string.repl
- The string scalar to replace for each pattern match.public final ColumnVector replaceRegex(RegexProgram regexProg, Scalar repl)
regexProg
- The regex program with pattern to search within each string.repl
- The string scalar to replace for each pattern match.@Deprecated public final ColumnVector replaceRegex(String pattern, Scalar repl, int maxRepl)
pattern
- The regular expression pattern to search within each string.repl
- The string scalar to replace for each pattern match.maxRepl
- The maximum number of times a replacement should occur within each string.public final ColumnVector replaceRegex(RegexProgram regexProg, Scalar repl, int maxRepl)
regexProg
- The regex program with pattern to search within each string.repl
- The string scalar to replace for each pattern match.maxRepl
- The maximum number of times a replacement should occur within each string.public final ColumnVector replaceMultiRegex(String[] patterns, ColumnView repls)
patterns
- The regular expression patterns to search within each string.repls
- The string scalars to replace for each corresponding pattern match.@Deprecated public final ColumnVector stringReplaceWithBackrefs(String pattern, String replace)
pattern
- The regular expression patterns to search within each string.replace
- The replacement template for creating the output string.public final ColumnVector stringReplaceWithBackrefs(RegexProgram regexProg, String replace)
regexProg
- The regex program with pattern to search within each string.replace
- The replacement template for creating the output string.public final ColumnVector zfill(int width)
width
- The minimum number of characters for each string.public final ColumnVector pad(int width)
width
- the minimum number of characters for each string.public final ColumnVector pad(int width, PadSide side)
width
- the minimum number of characters for each string.side
- where to add new characters.public final ColumnVector pad(int width, PadSide side, String fillChar)
width
- the minimum number of characters for each string.side
- where to add new characters.fillChar
- a single character string that holds what should be added.public final ColumnVector startsWith(Scalar pattern)
pattern
- scalar containing the string being searched for at the beginning of the column's strings.public final ColumnVector endsWith(Scalar pattern)
pattern
- scalar containing the string being searched for at the end of the column's strings.public final ColumnVector strip()
public final ColumnVector strip(Scalar toStrip)
toStrip
- UTF-8 encoded characters to strip from each string.public final ColumnVector lstrip()
public final ColumnVector lstrip(Scalar toStrip)
toStrip
- UTF-8 encoded characters to strip from each string.public final ColumnVector rstrip()
public final ColumnVector rstrip(Scalar toStrip)
toStrip
- UTF-8 encoded characters to strip from each string.public final ColumnVector stringContains(Scalar compString)
compString
- scalar containing the string being searched for.public final ColumnVector clamp(Scalar lo, Scalar hi)
lo
- - Minimum clamp value. All elements less than `lo` will be replaced by `lo`.
Ignored if null.hi
- - Maximum clamp value. All elements greater than `hi` will be replaced by `hi`.
Ignored if null.public final ColumnVector clamp(Scalar lo, Scalar loReplace, Scalar hi, Scalar hiReplace)
lo
- - Minimum clamp value. All elements less than `lo` will be replaced by `loReplace`. Ignored if null.loReplace
- - All elements less than `lo` will be replaced by `loReplace`.hi
- - Maximum clamp value. All elements greater than `hi` will be replaced by `hiReplace`. Ignored if null.hiReplace
- - All elements greater than `hi` will be replaced by `hiReplace`.@Deprecated public final ColumnVector matchesRe(String pattern)
pattern
- Regex pattern to match to each string.public final ColumnVector matchesRe(RegexProgram regexProg)
regexProg
- Regex program to match to each string.@Deprecated public final ColumnVector containsRe(String pattern)
pattern
- Regex pattern to match to each string.public final ColumnVector containsRe(RegexProgram regexProg)
regexProg
- Regex program to match to each string.@Deprecated public final Table extractRe(String pattern) throws CudfException
pattern
- the pattern to useCudfException
- if any error happens including if the RE does
not contain any capture groups.public final Table extractRe(RegexProgram regexProg) throws CudfException
regexProg
- the regex program to useCudfException
- if any error happens including if the regex
program does not contain any capture groups.@Deprecated public final ColumnVector extractAllRecord(String pattern, int idx)
pattern
- The regex patternidx
- The regex group indexpublic final ColumnVector extractAllRecord(RegexProgram regexProg, int idx)
regexProg
- The regex programidx
- The regex group indexpublic final ColumnVector like(Scalar pattern, Scalar escapeChar)
pattern
- Like pattern to match to each string.escapeChar
- Character specifies the escape prefix; default is "\\".public final ColumnVector urlDecode() throws CudfException
Any null entries will result in corresponding null entries in the output column.
CudfException
public final ColumnVector urlEncode() throws CudfException
Any null entries will result in corresponding null entries in the output column.
CudfException
public final ColumnVector getMapValue(ColumnView keys)
keys
- the column view with keys to lookup in the columnpublic final ColumnVector getMapValue(Scalar key)
key
- the scalar key to lookup in the columnpublic final ColumnVector getMapKeyExistence(Scalar key)
key
- the String scalar to lookup in the columnpublic final ColumnVector getMapKeyExistence(ColumnView keys)
keys
- the keys to lookup in the columnpublic static ColumnView makeStructView(long rows, ColumnView... columns)
rows
- the number of rows in the struct column. This is needed if no columns
are provided.columns
- the columns to add to the struct in the order they should be addedpublic static ColumnView makeStructView(ColumnView... columns)
columns
- the columns to add to the struct in the order they should be addedpublic static ColumnView fromDeviceBuffer(BaseDeviceMemoryBuffer buffer, long startOffset, DType type, int rows)
buffer
- device memory that will back the column viewstartOffset
- byte offset into the device buffer where the column data startstype
- type of data in the column viewrows
- number of data elements in the column viewpublic final ColumnVector listContains(Scalar key)
key
- the scalar to look uppublic final ColumnVector listContainsColumn(ColumnView key)
key
- the ColumnVector with look up valuespublic final ColumnVector listContainsNulls()
public final ColumnVector listIndexOf(Scalar key, ColumnView.FindOptions findOption)
key
- The scalar search keyfindOption
- Whether to find the first index of the key, or the last.public final ColumnVector listIndexOf(ColumnView keys, ColumnView.FindOptions findOption)
keys
- ColumnView of search keys.findOption
- Whether to find the first index of the key, or the last.public final ColumnVector listSortRows(boolean isDescending, boolean isNullSmallest)
isDescending
- whether sorting each row with descending order (or ascending order)isNullSmallest
- whether to regard the null value as the min value (or the max value)public static ColumnVector listsHaveOverlap(ColumnView lhs, ColumnView rhs)
lhs
- The input lists column for one siderhs
- The input lists column for the other sidepublic static ColumnVector listsIntersectDistinct(ColumnView lhs, ColumnView rhs)
lhs
- The input lists column for one siderhs
- The input lists column for the other sidepublic static ColumnVector listsUnionDistinct(ColumnView lhs, ColumnView rhs)
lhs
- The input lists column for one siderhs
- The input lists column for the other sidepublic static ColumnVector listsDifferenceDistinct(ColumnView lhs, ColumnView rhs)
lhs
- The input lists column for one siderhs
- The input lists column for the other sidepublic final ColumnVector generateListOffsets()
public final Scalar getScalarElement(int index)
index
- the index to look atCudfException
- if the index is out of bounds.public final ColumnVector applyBooleanMask(ColumnView booleanMaskView)
Given a list-of-bools column, the function produces a new `LIST` column of the same type as this column, where each element is copied from the row *only* if the corresponding `boolean_mask` is non-null and `true`.
E.g. column = { {0,1,2}, {3,4}, {5,6,7}, {8,9} }; boolean_mask = { {0,1,1}, {1,0}, {1,1,1}, {0,0} }; results = { {1,2}, {3}, {5,6,7}, {} };
This column and `boolean_mask` must have the same number of rows. The output column has the same number of rows as this column. An element is copied to an output row *only* if the corresponding boolean_mask element is `true`. An output row is invalid only if the row is invalid.
booleanMaskView
- A nullable list of bools column used to filter elements in this columnCudfException
- if `boolean_mask` is not a "lists of bools" columnCudfException
- if this column and `boolean_mask` have different number of rowspublic int distinctCount(NullPolicy nullPolicy)
nullPolicy
- if nulls should be included or not.public int distinctCount()
protected static long title(long handle)
public HostColumnVector copyToHost(HostMemoryAllocator hostMemoryAllocator)
public HostColumnVector copyToHostAsync(Cuda.Stream stream, HostMemoryAllocator hostMemoryAllocator)
public HostColumnVector copyToHost()
public HostColumnVector copyToHostAsync(Cuda.Stream stream)
public long getHostBytesRequired()
public static long hostPaddingSizeInBytes()
public boolean hasNonEmptyNulls()
public ColumnVector purgeNonEmptyNulls()
public ColumnVector toHex()
Copyright © 2024. All rights reserved.