Column Nullmask#
- group Bitmask Operations
Functions
-
size_type state_null_count(mask_state state, size_type size)#
Returns the null count for a null mask of the specified
state
representingsize
elements.- Throws:
std::invalid_argument – if state is UNINITIALIZED
- Parameters:
state – The state of the null mask
size – The number of elements represented by the mask
- Returns:
The count of null elements
-
std::size_t bitmask_allocation_size_bytes(size_type number_of_bits, std::size_t padding_boundary = 64)#
Computes the required bytes necessary to represent the specified number of bits with a given padding boundary.
Note
The Arrow specification for the null bitmask requires a 64B padding boundary.
- Parameters:
number_of_bits – The number of bits that need to be represented
padding_boundary – The value returned will be rounded up to a multiple of this value
- Returns:
The necessary number of bytes
-
size_type num_bitmask_words(size_type number_of_bits)#
Returns the number of
bitmask_type
words required to represent the specified number of bits.Unlike
bitmask_allocation_size_bytes
, which returns the number of bytes needed for a bitmask allocation (including padding), this function returns the actual numberbitmask_type
elements necessary to representnumber_of_bits
. This is useful when one wishes to process all of the bits in a bitmask and ignore the padding/slack bits.- Parameters:
number_of_bits – The number of bits that need to be represented
- Returns:
The necessary number of
bitmask_type
elements
-
rmm::device_buffer create_null_mask(size_type size, mask_state state, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Creates a
device_buffer
for use as a null value indicator bitmask of acolumn
- Parameters:
size – The number of elements to be represented by the mask
state – The desired state of the mask
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned device_buffer
- Returns:
A
device_buffer
for use as a null bitmask satisfying the desired size and state
-
void set_null_mask(bitmask_type *bitmask, size_type begin_bit, size_type end_bit, bool valid, rmm::cuda_stream_view stream = cudf::get_default_stream())#
Sets a pre-allocated bitmask buffer to a given state in the range
[begin_bit, end_bit)
Sets
[begin_bit, end_bit)
bits of bitmask to valid ifvalid==true
or null otherwise.- Parameters:
bitmask – Pointer to bitmask (e.g. returned by
column_view::null_mask()
)begin_bit – Index of the first bit to set (inclusive)
end_bit – Index of the last bit to set (exclusive)
valid – If true set all entries to valid; otherwise, set all to null
stream – CUDA stream used for device memory operations and kernel launches
-
void set_null_masks(cudf::host_span<bitmask_type*> bitmasks, cudf::host_span<size_type const> begin_bits, cudf::host_span<size_type const> end_bits, cudf::host_span<bool const> valids, rmm::cuda_stream_view stream = cudf::get_default_stream())#
Sets a vector of non-overlapping pre-allocated bitmask buffers to given states in the corresponding non-aliasing ranges in bulk.
Sets bit ranges
[begin_bit, end_bit)
of given bitmasks to specified valid states. The bitmask bit ranges must be non-overlapping and non-aliasing. i.e., attempting to concurrently set bits within the same physical word across bitmasks will result in undefined behavior. This utility is optimized for bulk operation on 16 or more bitmasks sized 2^24 bits or less.- Deprecated:
in 25.08 and to be removed in a future release. Use
cudf::set_null_masks_unsafe
instead.
- Parameters:
bitmasks – Pointers to bitmasks (e.g. returned by
column_view::null_mask()
)begin_bits – Indices of the first bits to set (inclusive)
end_bits – Indices of the last bits to set (exclusive)
valids – Booleans indicating if the corresponding bitmasks should be set to valid or null
stream – CUDA stream used for device memory operations and kernel launches
-
void set_null_masks_safe(cudf::host_span<bitmask_type*> bitmasks, cudf::host_span<size_type const> begin_bits, cudf::host_span<size_type const> end_bits, cudf::host_span<bool const> valids, rmm::cuda_stream_view stream = cudf::get_default_stream())#
Sets a vector of non-overlapping pre-allocated bitmask buffers to given states in the corresponding ranges in bulk.
Sets bit ranges
[begin_bit, end_bit)
of given bitmasks to specified valid states. The bitmask bit ranges must be non-overlapping. i.e., attempting to set a physical bit concurrently across bitmasks will result in undefined behavior. This utility is optimized for bulk operation on 16 or more bitmasks sized 2^24 bits or less.- Parameters:
bitmasks – Pointers to bitmasks (e.g. returned by
column_view::null_mask()
)begin_bits – Indices of the first bits to set (inclusive)
end_bits – Indices of the last bits to set (exclusive)
valids – Booleans indicating if the corresponding bitmasks should be set to valid or null
stream – CUDA stream used for device memory operations and kernel launches
-
void set_null_masks_unsafe(cudf::host_span<bitmask_type*> bitmasks, cudf::host_span<size_type const> begin_bits, cudf::host_span<size_type const> end_bits, cudf::host_span<bool const> valids, rmm::cuda_stream_view stream = cudf::get_default_stream())#
Sets a vector of non-overlapping pre-allocated bitmask buffers to given states in the corresponding non-aliasing ranges in bulk.
Sets bit ranges
[begin_bit, end_bit)
of given bitmasks to specified valid states. The bitmask bit ranges must be non-overlapping and non-aliasing. i.e., attempting to concurrently set bits within the same physical word across bitmasks will result in undefined behavior. This utility is optimized for bulk operation on 16 or more bitmasks sized 2^24 bits or less.- Parameters:
bitmasks – Pointers to bitmasks (e.g. returned by
column_view::null_mask()
)begin_bits – Indices of the first bits to set (inclusive)
end_bits – Indices of the last bits to set (exclusive)
valids – Booleans indicating if the corresponding bitmasks should be set to valid or null
stream – CUDA stream used for device memory operations and kernel launches
-
rmm::device_buffer copy_bitmask(bitmask_type const *mask, size_type begin_bit, size_type end_bit, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Creates a
device_buffer
from a slice of bitmask defined by a range of indices[begin_bit, end_bit)
Returns empty
device_buffer
ifbitmask == nullptr
.- Throws:
cudf::logic_error – if
begin_bit > end_bit
cudf::logic_error – if
begin_bit < 0
- Parameters:
mask – Bitmask residing in device memory whose bits will be copied
begin_bit – Index of the first bit to be copied (inclusive)
end_bit – Index of the last bit to be copied (exclusive)
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned device_buffer
- Returns:
A
device_buffer
containing the bits[begin_bit, end_bit)
frommask
.
-
rmm::device_buffer copy_bitmask(column_view const &view, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Copies
view
’s bitmask from the bits[view.offset(), view.offset() + view.size())
into adevice_buffer
Returns empty
device_buffer
if the column is not nullable- Parameters:
view – Column view whose bitmask needs to be copied
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned device_buffer
- Returns:
A
device_buffer
containing the bits[view.offset(), view.offset() + view.size())
fromview
’s bitmask.
-
std::pair<rmm::device_buffer, size_type> bitmask_and(table_view const &view, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Performs bitwise AND of the bitmasks of columns of a table. Returns a pair of resulting mask and count of unset bits.
If any of the columns isn’t nullable, it is considered all valid. If no column in the table is nullable, an empty bitmask is returned.
- Parameters:
view – The table of columns
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned device_buffer
- Returns:
A pair of resulting bitmask and count of unset bits
-
std::pair<std::vector<std::unique_ptr<rmm::device_buffer>>, std::vector<size_type>> segmented_bitmask_and(host_span<column_view const> colviews, host_span<size_type const> segment_offsets, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Performs segmented bitwise AND operations on the null masks of the input columns based on defined segments. For each segment, it computes the bitwise AND of the bitmasks of all columns within that segment. Returns a pair containing (i) a vector of unique pointers to device buffers, with each buffer containing the resulting bitmask for a segment, and (ii) a vector of integers representing the count of null (unset) bits for each segment.
The function assumes all the input columns passed are nullable.
- Parameters:
colviews – A span containing column views whose bitmasks will be ANDed within their respective segments
segment_offsets – A span containing the starting positions of each segment
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned device_buffer
- Returns:
A pair of vectors containing resulting bitmask and count of unset bits for each segment
-
std::pair<rmm::device_buffer, size_type> bitmask_or(table_view const &view, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Performs bitwise OR of the bitmasks of columns of a table. Returns a pair of resulting mask and count of unset bits.
If any of the columns isn’t nullable, it is considered all valid. If no column in the table is nullable, an empty bitmask is returned.
- Parameters:
view – The table of columns
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned device_buffer
- Returns:
A pair of resulting bitmask and count of unset bits
-
cudf::size_type null_count(bitmask_type const *bitmask, size_type start, size_type stop, rmm::cuda_stream_view stream = cudf::get_default_stream())#
Given a validity bitmask, counts the number of null elements (unset bits) in the range
[start, stop)
If
bitmask == nullptr
, all elements are assumed to be valid and the function returns ``.- Throws:
cudf::logic_error – if
start > stop
cudf::logic_error – if
start < 0
- Parameters:
bitmask – Validity bitmask residing in device memory.
start – Index of the first bit to count (inclusive).
stop – Index of the last bit to count (exclusive).
stream – CUDA stream used for device memory operations and kernel launches
- Returns:
The number of null elements in the specified range.
-
size_type state_null_count(mask_state state, size_type size)#