partitioning#

pylibcudf.partitioning.hash_partition(signatures, args, kwargs, defaults, _fused_sigindex={})#

Partitions rows from the input table into multiple output tables.

For details, see hash_partition().

Parameters:

inputTable: The table to partition
keysTable | list[int]: Table providing keys to hash or list of indices of input columns to hash
num_partitionsint: The number of partitions to use
hash_functionHashId: Hashing function apply to key columns.
seedint: Seed for hash function.
streamStream | None: CUDA stream on which to perform the operation.
mrDeviceMemoryResource | None: Device memory resource used to allocate the returned table’s device memory.

Returns:

tuple[Table, list[int]]: An output table and a list of num_partitions + 1 row offsets where partition i contains rows in the range [offsets[i], offsets[i+1])

pylibcudf.partitioning.partition( Table t, Column partition_map, int num_partitions, stream=None, DeviceMemoryResource mr=None, ) → tuple#

Partitions rows of t according to the mapping specified by partition_map.

For details, see partition().

Parameters:

tTable: The table to partition
partition_mapColumn: Non-nullable column of integer values that map each row in t to it’s partition.
num_partitionsint: The total number of partitions
streamStream | None: CUDA stream on which to perform the operation.
mrDeviceMemoryResource | None: Device memory resource used to allocate the returned table’s device memory.

Returns:

tuple[Table, list[int]]: An output table and a list of num_partitions + 1 row offsets where partition i contains rows in the range [offsets[i], offsets[i+1])

pylibcudf.partitioning.round_robin_partition( Table input, int num_partitions, int start_partition=0, stream=None, DeviceMemoryResource mr=None, ) → tuple#

Round-robin partition.

For details, see round_robin_partition().

Parameters:

inputTable: The input table to be round-robin partitioned
num_partitionsint: Number of partitions for the table
start_partitionint, default 0: Index of the 1st partition
streamStream | None: CUDA stream on which to perform the operation.
mrDeviceMemoryResource | None: Device memory resource used to allocate the returned table’s device memory.

Returns:

tuple[Table, list[int]]: The partitioned table and a list of num_partitions + 1 partition offsets where partition i contains rows in the range [offsets[i], offsets[i+1]).