Column#

class pylibcudf.column.Column(DataType data_type, size_type size, gpumemoryview data, gpumemoryview mask, size_type null_count, size_type offset, list children)#

A container of nullable device data as a column of elements.

This class is an implementation of Arrow columnar data specification for data stored on GPUs. It relies on Python memoryview-like semantics to maintain shared ownership of the data it is constructed with, so any input data may also be co-owned by other data structures. The Column is designed to be operated on using algorithms backed by libcudf.

Parameters:
data_typeDataType

The type of data in the column.

sizesize_type

The number of rows in the column.

datagpumemoryview

The data the column will refer to.

maskgpumemoryview

The null mask for the column.

null_countint

The number of null rows in the column.

offsetint

The offset into the data buffer where the column’s data begins.

childrenlist

The children of this column if it is a compound column type.

Methods

all_null_like(Column like, size_type size)

Create an all null column from a template.

child(self, size_type index)

Get a child column of this column.

children(self)

The children of the column.

copy(self)

Create a copy of the column.

data(self)

The data buffer of the column.

device_buffer_size(self)

The total size of the device buffers used by the Column.

from_array(cls, obj)

Create a Column from any object which supports the NumPy or CUDA array interface.

from_array_interface(cls, obj)

Create a Column from an object implementing the NumPy Array Interface.

from_arrow(obj, DataType dtype)

Create a Column from an Arrow-like object using the Arrow C Data Interface.

from_cuda_array_interface(cls, obj)

Create a Column from an object implementing the CUDA Array Interface.

from_iterable_of_py(obj, DataType dtype)

Create a Column from a Python iterable of scalar values or nested iterables.

from_rmm_buffer(DeviceBuffer buff, ...)

Create a Column from an RMM DeviceBuffer.

from_scalar(Scalar slr, size_type size)

Create a Column from a Scalar.

list_view(self)

Accessor for methods of a Column that are specific to lists.

null_count(self)

The number of null elements in the column.

null_mask(self)

The null mask of the column.

num_children(self)

The number of children of this column.

offset(self)

The offset of the column.

size(self)

The number of elements in the column.

struct_from_children(cls, children)

Create a struct Column from a list of child columns.

to_scalar(self)

Return the first value of 1-element column as a Scalar.

type(self)

The type of data in the column.

with_mask(self, gpumemoryview mask, ...)

Augment this column with a new null mask.

static all_null_like(Column like, size_type size)#

Create an all null column from a template.

Parameters:
likeColumn

Column whose type we should mimic

sizeint

Number of rows in the resulting column.

Returns:
Column

An all-null column of size rows and type matching like.

child(self, size_type index) Column#

Get a child column of this column.

Parameters:
indexsize_type

The index of the child column to get.

Returns:
Column

The child column.

children(self) list#

The children of the column.

copy(self) Column#

Create a copy of the column.

data(self) gpumemoryview#

The data buffer of the column.

device_buffer_size(self) uint64_t#

The total size of the device buffers used by the Column.

Returns:
Number of bytes.

Notes

Since Columns rely on Python memoryview-like semantics to maintain shared ownership of the data, the device buffers underlying this column might be shared between other data structures including other columns.

classmethod from_array(cls, obj)#

Create a Column from any object which supports the NumPy or CUDA array interface.

Parameters:
objobject

The input array to be converted into a pylibcudf.Column.

Returns:
Column
Raises:
TypeError

If the input does not implement a supported array interface.

Notes

  • Only C-contiguous host and device ndarrays are supported. For device arrays, the data is not copied.

Examples

>>> import pylibcudf as plc
>>> import cupy as cp
>>> cp_arr = cp.array([[1,2],[3,4]])
>>> col = plc.Column.from_array(cp_arr)
classmethod from_array_interface(cls, obj)#

Create a Column from an object implementing the NumPy Array Interface.

If the object provides a raw memory pointer via the “data” field, we use that pointer directly and avoid copying. Otherwise, a ValueError is raised.

Parameters:
objAny

Must implement the __array_interface__ protocol.

Returns:
Column

A Column containing the data from the array interface.

Raises:
TypeError

If the object does not implement __array_interface__.

ValueError

If the array is not 1D or 2D, or is not C-contiguous. If the number of rows exceeds size_type limit. If the ‘data’ field is invalid.

NotImplementedError

If the object has a mask.

static from_arrow(obj: ArrowLike, DataType dtype: DataType | None = None) Column#

Create a Column from an Arrow-like object using the Arrow C Data Interface.

This method supports host and device Arrow arrays or streams. It detects the type of Arrow object provided and constructs a pylibcudf.Column accordingly using the appropriate Arrow C pointer-based interface.

Parameters:
objArrow-like

An object implementing one of the following: - __arrow_c_array__ (host Arrow array) - __arrow_c_device_array__ (device Arrow array) - __arrow_c_stream__ (host Arrow stream) - __arrow_c_device_stream__ (device Arrow stream)

dtypeDataType | None

The pylibcudf data type.

Returns:
Column

A pylibcudf.Column representing the Arrow data.

Raises:
NotImplementedError

If the Arrow-like object is a device stream (__arrow_c_device_stream__). If the dtype argument is not None.

ValueError

If the object does not implement a known Arrow C interface.

Notes

  • This method supports zero-copy construction for device arrays.

classmethod from_cuda_array_interface(cls, obj)#

Create a Column from an object implementing the CUDA Array Interface.

Parameters:
objAny

Must implement the __cuda_array_interface__ protocol.

Returns:
Column

A Column containing the data from the CUDA array interface.

Raises:
TypeError

If the object does not support __cuda_array_interface__.

ValueError

If the object is not 1D or 2D, or is not C-contiguous. If the number of rows exceeds size_type limit.

NotImplementedError

If the object has a mask.

static from_iterable_of_py(obj: Iterable, DataType dtype: DataType | None = None) Column#

Create a Column from a Python iterable of scalar values or nested iterables.

Parameters:
objIterable

An iterable of Python scalar values (int, float, bool, str) or nested lists.

dtypeDataType | None

The type of the leaf elements. If not specified, the type is inferred.

Returns:
Column

A Column containing the data from the input iterable.

Raises:
TypeError

If the input contains unsupported scalar types.

ValueError

If the iterable is empty and dtype is not provided.

Notes

  • Only scalar types int, float, bool, and str are supported.

  • Nested iterables must be materialized as lists.

  • Jagged nested lists are not supported. Inner lists must have the same shape.

  • Nulls (None) are not currently supported in input values.

  • dtype must match the inferred or actual type of the scalar values

  • Large strings are supported, meaning the combined length of all strings (in bytes) can exceed the maximum 32-bit integer value. In that case, the offsets column is automatically promoted to use 64-bit integers.

static from_rmm_buffer(DeviceBuffer buff, DataType dtype, size_type size, list children)#

Create a Column from an RMM DeviceBuffer.

Parameters:
buffDeviceBuffer

The data rmm.DeviceBuffer.

sizesize_type

The number of rows in the column.

dtypeDataType

The type of the data in the buffer.

childrenlist

List of child columns.

Notes

To provide a mask and null count, use Column.with_mask after this method.

static from_scalar(Scalar slr, size_type size)#

Create a Column from a Scalar.

Parameters:
slrScalar

The scalar to create a column from.

sizesize_type

The number of elements in the column.

Returns:
Column

A Column containing the scalar repeated size times.

list_view(self) ListColumnView#

Accessor for methods of a Column that are specific to lists.

null_count(self) size_type#

The number of null elements in the column.

null_mask(self) gpumemoryview#

The null mask of the column.

num_children(self) size_type#

The number of children of this column.

offset(self) size_type#

The offset of the column.

size(self) size_type#

The number of elements in the column.

classmethod struct_from_children(cls, children: Iterable[Column])#

Create a struct Column from a list of child columns.

Parameters:
childrenIterable[Column]

A list of child columns.

Returns:
Column

A struct Column with the provided the child columns.

Notes

The null count and null mask is taken from the first child column. Use Column.with_mask on the result of struct_from_children to reset the null count and mask.

to_scalar(self) Scalar#

Return the first value of 1-element column as a Scalar.

Returns:
Scalar

A Scalar representing the only value in the column, including nulls.

Raises:
ValueError

If the column has more than one row.

type(self) DataType#

The type of data in the column.

with_mask(self, gpumemoryview mask, size_type null_count) Column#

Augment this column with a new null mask.

Parameters:
maskgpumemoryview

New mask (or None to unset the mask)

null_countint

New null count. If this is incorrect, bad things happen.

Returns:
New Column object sharing data with self (except for the mask which is new).
class pylibcudf.column.ListColumnView(Column col)#

Accessor for methods of a Column that are specific to lists.

Methods

child(self)

The data column of the underlying list column.

offsets(self)

The offsets column of the underlying list column.

child(self)#

The data column of the underlying list column.

offsets(self)#

The offsets column of the underlying list column.

pylibcudf.column.is_c_contiguous(shape: Sequence[int], strides: None | Sequence[int], int itemsize: int) bool#

Determine if shape and strides are C-contiguous

Parameters:
shapeSequence[int]

Number of elements in each dimension.

stridesNone | Sequence[int]

The stride of each dimension in bytes. If None, the memory layout is C-contiguous.

itemsizeint

Size of an element in bytes.

Returns:
bool

The boolean answer.