Column Copy#
- group column_copy
Enums
-
enum class out_of_bounds_policy : bool#
Policy to account for possible out-of-bounds indices.
NULLIFY
means to nullify output values corresponding to out-of-bounds gather_map values.DONT_CHECK
means do not check whether the indices are out-of-bounds, for better performance.Values:
-
enumerator NULLIFY#
Output values corresponding to out-of-bounds indices are null.
-
enumerator DONT_CHECK#
No bounds checking is performed, better performance.
-
enumerator NULLIFY#
-
enum class mask_allocation_policy : int32_t#
Indicates when to allocate a mask, based on an existing mask.
Values:
-
enumerator NEVER#
Do not allocate a null mask, regardless of input.
-
enumerator RETAIN#
Allocate a null mask if the input contains one.
-
enumerator ALWAYS#
Allocate a null mask, regardless of input.
-
enumerator NEVER#
Functions
-
std::unique_ptr<table> reverse(table_view const &source_table, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Reverses the rows within a table.
Creates a new table that is the reverse of
source_table
. Example:source = [[4,5,6], [7,8,9], [10,11,12]] return = [[6,5,4], [9,8,7], [12,11,10]]
- Parameters:
source_table – Table that will be reversed
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned table’s device memory
- Returns:
Reversed table
-
std::unique_ptr<column> reverse(column_view const &source_column, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Reverses the elements of a column.
Creates a new column that is the reverse of
source_column
. Example:source = [4,5,6] return = [6,5,4]
- Parameters:
source_column – Column that will be reversed
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned table’s device memory
- Returns:
Reversed column
-
std::unique_ptr<column> empty_like(column_view const &input)#
Initializes and returns an empty column of the same type as the
input
.- Parameters:
input – [in] Immutable view of input column to emulate
- Returns:
An empty column of same type as
input
-
std::unique_ptr<column> empty_like(scalar const &input)#
Initializes and returns an empty column of the same type as the
input
.- Parameters:
input – [in] Scalar to emulate
- Returns:
An empty column of same type as
input
-
std::unique_ptr<column> allocate_like(column_view const &input, mask_allocation_policy mask_alloc = mask_allocation_policy::RETAIN, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Creates an uninitialized new column of the same size and type as the
input
.Supports only fixed-width types.
If the
mask_alloc
allocates a validity mask that mask is also uninitialized and the validity bits and the null count should be set by the caller.- Throws:
cudf::data_type_error – if input type is not of fixed width.
- Parameters:
input – Immutable view of input column to emulate
mask_alloc – Optional, Policy for allocating null mask. Defaults to RETAIN
mr – Device memory resource used to allocate the returned column’s device memory
stream – CUDA stream used for device memory operations and kernel launches
- Returns:
A column with sufficient uninitialized capacity to hold the same number of elements as
input
of the same type asinput.type()
-
std::unique_ptr<column> allocate_like(column_view const &input, size_type size, mask_allocation_policy mask_alloc = mask_allocation_policy::RETAIN, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Creates an uninitialized new column of the specified size and same type as the
input
.Supports only fixed-width types.
If the
mask_alloc
allocates a validity mask that mask is also uninitialized and the validity bits and the null count should be set by the caller.- Parameters:
input – Immutable view of input column to emulate
size – The desired number of elements that the new column should have capacity for
mask_alloc – Optional, Policy for allocating null mask. Defaults to RETAIN
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
A column with sufficient uninitialized capacity to hold the specified number of elements as
input
of the same type asinput.type()
-
std::unique_ptr<table> empty_like(table_view const &input_table)#
Creates a table of empty columns with the same types as the
input_table
Creates the
cudf::column
objects, but does not allocate any underlying device memory for the column’s data or bitmask.- Parameters:
input_table – [in] Immutable view of input table to emulate
- Returns:
A table of empty columns with the same types as the columns in
input_table
-
void copy_range_in_place(column_view const &source, mutable_column_view &target, size_type source_begin, size_type source_end, size_type target_begin, rmm::cuda_stream_view stream = cudf::get_default_stream())#
Copies a range of elements in-place from one column to another.
Overwrites the range of elements in
target
indicated by the indices [target_begin
,target_begin
+ N) with the elements fromsource
indicated by the indices [source_begin
,source_end
) (where N = (source_end
-source_begin
)). Use the out-of-place copy function returning std::unique_ptr<column> for uses cases requiring memory reallocation. For example for strings columns and other variable-width types.If
source
andtarget
refer to the same elements and the ranges overlap, the behavior is undefined.- Throws:
cudf::data_type_error – if memory reallocation is required (e.g. for variable width types).
std::out_of_range – for invalid range (if
source_begin
>source_end
,source_begin
< 0,source_begin
>=source.size()
,source_end
>source.size()
,target_begin
< 0, target_begin >=target.size()
, ortarget_begin
+ (source_end
-source_begin
) >target.size()
).cudf::data_type_error – if
target
andsource
have different types.std::invalid_argument – if
source
has null values andtarget
is not nullable.
- Parameters:
source – The column to copy from
target – The preallocated column to copy into
source_begin – The starting index of the source range (inclusive)
source_end – The index of the last element in the source range (exclusive)
target_begin – The starting index of the target range (inclusive)
stream – CUDA stream used for device memory operations and kernel launches
-
std::unique_ptr<column> copy_range(column_view const &source, column_view const &target, size_type source_begin, size_type source_end, size_type target_begin, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Copies a range of elements out-of-place from one column to another.
Creates a new column as if an in-place copy was performed into
target
. A copy oftarget
is created first and then the elements indicated by the indices [target_begin
,target_begin
+ N) were copied from the elements indicated by the indices [source_begin
,source_end
) ofsource
(where N = (source_end
-source_begin
)). Elements outside the range are copied fromtarget
into the returned new column target.If
source
andtarget
refer to the same elements and the ranges overlap, the behavior is undefined.A range is considered invalid if:
Either the begin or end indices are out of bounds for the corresponding column
Begin is greater than end for source or target
The size of the source range would overflow the target column starting at target_begin
- Throws:
std::out_of_range – for any invalid range.
cudf::data_type_error – if
target
andsource
have different types.cudf::data_type_error – if the data type is not fixed width, string, or dictionary
- Parameters:
source – The column to copy from inside the range
target – The column to copy from outside the range
source_begin – The starting index of the source range (inclusive)
source_end – The index of the last element in the source range (exclusive)
target_begin – The starting index of the target range (inclusive)
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
The result target column
-
std::unique_ptr<column> copy_if_else(column_view const &lhs, column_view const &rhs, column_view const &boolean_mask, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns a new column, where each element is selected from either
lhs
orrhs
based on the value of the corresponding element inboolean_mask
.Selects each element i in the output column from either
rhs
orlhs
using the following rule:output[i] = (boolean_mask.valid(i) and boolean_mask[i]) ? lhs[i] : rhs[i]
- Throws:
cudf::data_type_error – if lhs and rhs are not of the same type
std::invalid_argument – if lhs and rhs are not of the same length
cudf::data_type_error – if boolean mask is not of type bool
std::invalid_argument – if boolean mask is not of the same length as lhs and rhs
- Parameters:
lhs – left-hand column_view
rhs – right-hand column_view
boolean_mask – column of
type_id::BOOL8
representing “left (true) / right (false)” boolean for each element. Null element represents false.stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
new column with the selected elements
-
std::unique_ptr<column> copy_if_else(scalar const &lhs, column_view const &rhs, column_view const &boolean_mask, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns a new column, where each element is selected from either
lhs
orrhs
based on the value of the corresponding element inboolean_mask
.Selects each element i in the output column from either
rhs
orlhs
using the following rule:output[i] = (boolean_mask.valid(i) and boolean_mask[i]) ? lhs : rhs[i]
- Throws:
cudf::data_type_error – if lhs and rhs are not of the same type
cudf::data_type_error – if boolean mask is not of type bool
std::invalid_argument – if boolean mask is not of the same length as lhs and rhs
- Parameters:
lhs – left-hand scalar
rhs – right-hand column_view
boolean_mask – column of
type_id::BOOL8
representing “left (true) / right (false)” boolean for each element. Null element represents false.stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
new column with the selected elements
-
std::unique_ptr<column> copy_if_else(column_view const &lhs, scalar const &rhs, column_view const &boolean_mask, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns a new column, where each element is selected from either
lhs
orrhs
based on the value of the corresponding element inboolean_mask
.Selects each element i in the output column from either
rhs
orlhs
using the following rule:output[i] = (boolean_mask.valid(i) and boolean_mask[i]) ? lhs[i] : rhs
- Throws:
cudf::data_type_error – if lhs and rhs are not of the same type
cudf::data_type_error – if boolean mask is not of type bool
std::invalid_argument – if boolean mask is not of the same length as lhs and rhs
- Parameters:
lhs – left-hand column_view
rhs – right-hand scalar
boolean_mask – column of
type_id::BOOL8
representing “left (true) / right (false)” boolean for each element. Null element represents false.stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
new column with the selected elements
-
std::unique_ptr<column> copy_if_else(scalar const &lhs, scalar const &rhs, column_view const &boolean_mask, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns a new column, where each element is selected from either
lhs
orrhs
based on the value of the corresponding element inboolean_mask
.Selects each element i in the output column from either
rhs
orlhs
using the following rule:output[i] = (boolean_mask.valid(i) and boolean_mask[i]) ? lhs : rhs
- Throws:
cudf::logic_error – if boolean mask is not of type bool
- Parameters:
lhs – left-hand scalar
rhs – right-hand scalar
boolean_mask – column of
type_id::BOOL8
representing “left (true) / right (false)” boolean for each element. null element represents false.stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
new column with the selected elements
-
std::unique_ptr<scalar> get_element(column_view const &input, size_type index, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Get the element at specified index from a column.
Warning
This function is expensive (invokes a kernel launch). So, it is not recommended to be used in performance sensitive code or inside a loop.
- Throws:
std::out_of_range – if
index
is not within the range[0, input.size())
- Parameters:
input – Column view to get the element from
index – Index into
input
to get the element atstream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned scalar’s device memory
- Returns:
Scalar containing the single value
-
std::unique_ptr<table> sample(table_view const &input, size_type const n, sample_with_replacement replacement = sample_with_replacement::FALSE, int64_t const seed = 0, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Gather
n
samples from giveninput
randomly.Example: input: {col1: {1, 2, 3, 4, 5}, col2: {6, 7, 8, 9, 10}} n: 3 replacement: false output: {col1: {3, 1, 4}, col2: {8, 6, 9}} replacement: true output: {col1: {3, 1, 1}, col2: {8, 6, 6}}
- Throws:
cudf::logic_error – if
n
>input.num_rows()
andreplacement
== FALSE.cudf::logic_error – if
n
< 0.
- Parameters:
input – View of a table to sample
n – non-negative number of samples expected from
input
replacement – Allow or disallow sampling of the same row more than once
seed – Seed value to initiate random number generator
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned table’s device memory
- Returns:
Table containing samples from
input
-
bool has_nonempty_nulls(column_view const &input, rmm::cuda_stream_view stream = cudf::get_default_stream())#
Checks if a column or its descendants have non-empty null rows.
A LIST or STRING column might have non-empty rows that are marked as null. A STRUCT OR LIST column might have child columns that have non-empty null rows. Other types of columns are deemed incapable of having non-empty null rows. E.g. Fixed width columns have no concept of an “empty” row.
Note
This function is exact. If it returns
true
, there exists one or more non-empty null elements.- Parameters:
input – The column which is (and whose descendants are) to be checked for non-empty null rows.
stream – CUDA stream used for device memory operations and kernel launches
- Returns:
true If either the column or its descendants have non-empty null rows
- Returns:
false If neither the column or its descendants have non-empty null rows
-
bool may_have_nonempty_nulls(column_view const &input)#
Approximates if a column or its descendants may have non-empty null elements.
False positives are possible, but false negatives are not.
Compared to the exact
has_nonempty_nulls()
function, this function is typically more efficient.Complexity:
Best case:
O(count_descendants(input))
Worst case:
O(count_descendants(input)) * m
, wherem
is the number of rows in the largest descendant
Note
This function is approximate.
true
: Non-empty null elements could existfalse
: Non-empty null elements definitely do not exist
- Parameters:
input – The column which is (and whose descendants are) to be checked for non-empty null rows
- Returns:
true If either the column or its descendants have null rows
- Returns:
false If neither the column nor its descendants have null rows
-
std::unique_ptr<column> purge_nonempty_nulls(column_view const &input, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Copy
input
into output while purging any non-empty null rows in the column or its descendants.If the input column is not of compound type (LIST/STRING/STRUCT/DICTIONARY), the output will be the same as input.
The purge operation only applies directly to LIST and STRING columns, but it applies indirectly to STRUCT/DICTIONARY columns as well, since these columns may have child columns that are LIST or STRING.
Examples:
auto const lists = lists_column_wrapper<int32_t>{ {0,1}, {2,3}, {4,5} }.release(); cudf::set_null_mask(lists->null_mask(), 1, 2, false); lists[1] is now null, but the lists child column still stores `{2,3}`. The lists column contents will be: Validity: 101 Offsets: [0, 2, 4, 6] Child: [0, 1, 2, 3, 4, 5] After purging the contents of the list's null rows, the column's contents will be: Validity: 101 Offsets: [0, 2, 2, 4] Child: [0, 1, 4, 5]
auto const strings = strings_column_wrapper{ "AB", "CD", "EF" }.release(); cudf::set_null_mask(strings->null_mask(), 1, 2, false); strings[1] is now null, but the strings column still stores `"CD"`. The lists column contents will be: Validity: 101 Offsets: [0, 2, 4, 6] Child: [A, B, C, D, E, F] After purging the contents of the list's null rows, the column's contents will be: Validity: 101 Offsets: [0, 2, 2, 4] Child: [A, B, E, F]
auto const lists = lists_column_wrapper<int32_t>{ {0,1}, {2,3}, {4,5} }; auto const structs = structs_column_wrapper{ {lists}, null_at(1) }; structs[1].child is now null, but the lists column still stores `{2,3}`. The lists column contents will be: Validity: 101 Offsets: [0, 2, 4, 6] Child: [0, 1, 2, 3, 4, 5] After purging the contents of the list's null rows, the column's contents will be: Validity: 101 Offsets: [0, 2, 2, 4] Child: [0, 1, 4, 5]
- Parameters:
input – The column whose null rows are to be checked and purged
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
A new column with equivalent contents to
input
, but with null rows purged
-
enum class out_of_bounds_policy : bool#