Column#
- class pylibcudf.column.Column(DataType data_type, size_type size, gpumemoryview data, gpumemoryview mask, size_type null_count, size_type offset, list children)#
A container of nullable device data as a column of elements.
This class is an implementation of Arrow columnar data specification for data stored on GPUs. It relies on Python memoryview-like semantics to maintain shared ownership of the data it is constructed with, so any input data may also be co-owned by other data structures. The Column is designed to be operated on using algorithms backed by libcudf.
- Parameters:
- data_typeDataType
The type of data in the column.
- sizesize_type
The number of rows in the column.
- datagpumemoryview
The data the column will refer to.
- maskgpumemoryview
The null mask for the column.
- null_countint
The number of null rows in the column.
- offsetint
The offset into the data buffer where the column’s data begins.
- childrenlist
The children of this column if it is a compound column type.
Methods
all_null_like
(Column like, size_type size)Create an all null column from a template.
child
(self, size_type index)Get a child column of this column.
children
(self)The children of the column.
copy
(self)Create a copy of the column.
data
(self)The data buffer of the column.
device_buffer_size
(self)The total size of the device buffers used by the Column.
from_array
(cls, obj)Create a Column from any object which supports the NumPy or CUDA array interface.
from_array_interface
(cls, obj)Create a Column from an object implementing the NumPy Array Interface.
from_arrow
(obj, DataType dtype)Create a Column from an Arrow-like object using the Arrow C Data Interface.
from_cuda_array_interface
(cls, obj)Create a Column from an object implementing the CUDA Array Interface.
from_iterable_of_py
(obj, DataType dtype)Create a Column from a Python iterable of scalar values or nested iterables.
from_rmm_buffer
(DeviceBuffer buff, ...)Create a Column from an RMM DeviceBuffer.
from_scalar
(Scalar slr, size_type size)Create a Column from a Scalar.
list_view
(self)Accessor for methods of a Column that are specific to lists.
null_count
(self)The number of null elements in the column.
null_mask
(self)The null mask of the column.
num_children
(self)The number of children of this column.
offset
(self)The offset of the column.
size
(self)The number of elements in the column.
struct_from_children
(cls, children)Create a struct Column from a list of child columns.
to_scalar
(self)Return the first value of 1-element column as a Scalar.
type
(self)The type of data in the column.
with_mask
(self, gpumemoryview mask, ...)Augment this column with a new null mask.
- static all_null_like(Column like, size_type size)#
Create an all null column from a template.
- Parameters:
- likeColumn
Column whose type we should mimic
- sizeint
Number of rows in the resulting column.
- Returns:
- Column
An all-null column of size rows and type matching like.
- child(self, size_type index) Column #
Get a child column of this column.
- Parameters:
- indexsize_type
The index of the child column to get.
- Returns:
- Column
The child column.
- data(self) gpumemoryview #
The data buffer of the column.
- device_buffer_size(self) uint64_t #
The total size of the device buffers used by the Column.
- Returns:
- Number of bytes.
Notes
Since Columns rely on Python memoryview-like semantics to maintain shared ownership of the data, the device buffers underlying this column might be shared between other data structures including other columns.
- classmethod from_array(cls, obj)#
Create a Column from any object which supports the NumPy or CUDA array interface.
- Parameters:
- objobject
The input array to be converted into a pylibcudf.Column.
- Returns:
- Column
- Raises:
- TypeError
If the input does not implement a supported array interface.
Notes
Only C-contiguous host and device ndarrays are supported. For device arrays, the data is not copied.
Examples
>>> import pylibcudf as plc >>> import cupy as cp >>> cp_arr = cp.array([[1,2],[3,4]]) >>> col = plc.Column.from_array(cp_arr)
- classmethod from_array_interface(cls, obj)#
Create a Column from an object implementing the NumPy Array Interface.
If the object provides a raw memory pointer via the “data” field, we use that pointer directly and avoid copying. Otherwise, a ValueError is raised.
- Parameters:
- objAny
Must implement the
__array_interface__
protocol.
- Returns:
- Column
A Column containing the data from the array interface.
- Raises:
- TypeError
If the object does not implement
__array_interface__
.- ValueError
If the array is not 1D or 2D, or is not C-contiguous. If the number of rows exceeds size_type limit. If the ‘data’ field is invalid.
- NotImplementedError
If the object has a mask.
- static from_arrow(obj: ArrowLike, DataType dtype: DataType | None = None) Column #
Create a Column from an Arrow-like object using the Arrow C Data Interface.
This method supports host and device Arrow arrays or streams. It detects the type of Arrow object provided and constructs a pylibcudf.Column accordingly using the appropriate Arrow C pointer-based interface.
- Parameters:
- objArrow-like
An object implementing one of the following: - __arrow_c_array__ (host Arrow array) - __arrow_c_device_array__ (device Arrow array) - __arrow_c_stream__ (host Arrow stream) - __arrow_c_device_stream__ (device Arrow stream)
- dtypeDataType | None
The pylibcudf data type.
- Returns:
- Column
A pylibcudf.Column representing the Arrow data.
- Raises:
- NotImplementedError
If the Arrow-like object is a device stream (__arrow_c_device_stream__). If the dtype argument is not None.
- ValueError
If the object does not implement a known Arrow C interface.
Notes
This method supports zero-copy construction for device arrays.
- classmethod from_cuda_array_interface(cls, obj)#
Create a Column from an object implementing the CUDA Array Interface.
- Parameters:
- objAny
Must implement the
__cuda_array_interface__
protocol.
- Returns:
- Column
A Column containing the data from the CUDA array interface.
- Raises:
- TypeError
If the object does not support
__cuda_array_interface__
.- ValueError
If the object is not 1D or 2D, or is not C-contiguous. If the number of rows exceeds size_type limit.
- NotImplementedError
If the object has a mask.
- static from_iterable_of_py(obj: Iterable, DataType dtype: DataType | None = None) Column #
Create a Column from a Python iterable of scalar values or nested iterables.
- Parameters:
- objIterable
An iterable of Python scalar values (int, float, bool, str) or nested lists.
- dtypeDataType | None
The type of the leaf elements. If not specified, the type is inferred.
- Returns:
- Column
A Column containing the data from the input iterable.
- Raises:
- TypeError
If the input contains unsupported scalar types.
- ValueError
If the iterable is empty and dtype is not provided.
Notes
Only scalar types int, float, bool, and str are supported.
Nested iterables must be materialized as lists.
Jagged nested lists are not supported. Inner lists must have the same shape.
Nulls (None) are not currently supported in input values.
dtype must match the inferred or actual type of the scalar values
Large strings are supported, meaning the combined length of all strings (in bytes) can exceed the maximum 32-bit integer value. In that case, the offsets column is automatically promoted to use 64-bit integers.
- static from_rmm_buffer(DeviceBuffer buff, DataType dtype, size_type size, list children)#
Create a Column from an RMM DeviceBuffer.
- Parameters:
- buffDeviceBuffer
The data rmm.DeviceBuffer.
- sizesize_type
The number of rows in the column.
- dtypeDataType
The type of the data in the buffer.
- childrenlist
List of child columns.
Notes
To provide a mask and null count, use Column.with_mask after this method.
- static from_scalar(Scalar slr, size_type size)#
Create a Column from a Scalar.
- Parameters:
- slrScalar
The scalar to create a column from.
- sizesize_type
The number of elements in the column.
- Returns:
- Column
A Column containing the scalar repeated size times.
- list_view(self) ListColumnView #
Accessor for methods of a Column that are specific to lists.
- null_count(self) size_type #
The number of null elements in the column.
- null_mask(self) gpumemoryview #
The null mask of the column.
- num_children(self) size_type #
The number of children of this column.
- offset(self) size_type #
The offset of the column.
- size(self) size_type #
The number of elements in the column.
- classmethod struct_from_children(cls, children: Iterable[Column])#
Create a struct Column from a list of child columns.
- Parameters:
- childrenIterable[Column]
A list of child columns.
- Returns:
- Column
A struct Column with the provided the child columns.
Notes
The null count and null mask is taken from the first child column. Use Column.with_mask on the result of struct_from_children to reset the null count and mask.
- to_scalar(self) Scalar #
Return the first value of 1-element column as a Scalar.
- Returns:
- Scalar
A Scalar representing the only value in the column, including nulls.
- Raises:
- ValueError
If the column has more than one row.
- with_mask(self, gpumemoryview mask, size_type null_count) Column #
Augment this column with a new null mask.
- Parameters:
- maskgpumemoryview
New mask (or None to unset the mask)
- null_countint
New null count. If this is incorrect, bad things happen.
- Returns:
- New Column object sharing data with self (except for the mask which is new).
- class pylibcudf.column.ListColumnView(Column col)#
Accessor for methods of a Column that are specific to lists.
Methods
child
(self)The data column of the underlying list column.
offsets
(self)The offsets column of the underlying list column.
- child(self)#
The data column of the underlying list column.
- offsets(self)#
The offsets column of the underlying list column.
- pylibcudf.column.is_c_contiguous(shape: Sequence[int], strides: None | Sequence[int], int itemsize: int) bool #
Determine if shape and strides are C-contiguous
- Parameters:
- shapeSequence[int]
Number of elements in each dimension.
- stridesNone | Sequence[int]
The stride of each dimension in bytes. If None, the memory layout is C-contiguous.
- itemsizeint
Size of an element in bytes.
- Returns:
- bool
The boolean answer.