partitioning#

pylibcudf.partitioning.hash_partition(signatures, args, kwargs, defaults, _fused_sigindex={})#

Partitions rows from the input table into multiple output tables.

For details, see hash_partition().

Parameters:
inputTable

The table to partition

keysTable | list[int]

Table providing keys to hash or list of indices of input columns to hash

num_partitionsint

The number of partitions to use

hash_functionHashId

Hashing function apply to key columns.

seedint

Seed for hash function.

streamStream | None

CUDA stream on which to perform the operation.

mrDeviceMemoryResource | None

Device memory resource used to allocate the returned table’s device memory.

Returns:
tuple[Table, list[int]]

An output table and a list of num_partitions + 1 row offsets where partition i contains rows in the range [offsets[i], offsets[i+1])

pylibcudf.partitioning.partition(Table t, Column partition_map, int num_partitions, Stream stream=None, DeviceMemoryResource mr=None) tuple#

Partitions rows of t according to the mapping specified by partition_map.

For details, see partition().

Parameters:
tTable

The table to partition

partition_mapColumn

Non-nullable column of integer values that map each row in t to it’s partition.

num_partitionsint

The total number of partitions

streamStream | None

CUDA stream on which to perform the operation.

mrDeviceMemoryResource | None

Device memory resource used to allocate the returned table’s device memory.

Returns:
tuple[Table, list[int]]

An output table and a list of num_partitions + 1 row offsets where partition i contains rows in the range [offsets[i], offsets[i+1])

pylibcudf.partitioning.round_robin_partition(Table input, int num_partitions, int start_partition=0, Stream stream=None, DeviceMemoryResource mr=None) tuple#

Round-robin partition.

For details, see round_robin_partition().

Parameters:
inputTable

The input table to be round-robin partitioned

num_partitionsint

Number of partitions for the table

start_partitionint, default 0

Index of the 1st partition

streamStream | None

CUDA stream on which to perform the operation.

mrDeviceMemoryResource | None

Device memory resource used to allocate the returned table’s device memory.

Returns:
tuple[Table, list[int]]

The partitioned table and a list of num_partitions + 1 partition offsets where partition i contains rows in the range [offsets[i], offsets[i+1]).