Column Sort#

group column_sort

Enums

enum class rank_method : int32_t#

Tie-breaker method to use for ranking the column.

See also

cudf::make_rank_aggregation for more details.

Values:

enumerator FIRST#

stable sort order ranking (no ties)

enumerator AVERAGE#

mean of first in the group

enumerator MIN#

min of first in the group

enumerator MAX#

max of first in the group

enumerator DENSE#

rank always increases by 1 between groups

Functions

std::unique_ptr<column> sorted_order(table_view const &input, std::vector<order> const &column_order = {}, std::vector<null_order> const &null_precedence = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Computes the row indices that would produce input in a lexicographical sorted order.

Parameters:
  • input – The table to sort

  • column_order – The desired sort order for each column. Size must be equal to input.num_columns() or empty. If empty, all columns will be sorted in ascending order.

  • null_precedence – The desired order of null compared to other elements for each column. Size must be equal to input.num_columns() or empty. If empty, all columns will be sorted in null_order::BEFORE.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

A non-nullable column of elements containing the permuted row indices of input if it were sorted

std::unique_ptr<column> stable_sorted_order(table_view const &input, std::vector<order> const &column_order = {}, std::vector<null_order> const &null_precedence = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Computes the row indices that would produce input in a stable lexicographical sorted order.

The order of equivalent elements is guaranteed to be preserved.

Computes the row indices that would produce input in a lexicographical sorted order.

Parameters:
  • input – The table to sort

  • column_order – The desired sort order for each column. Size must be equal to input.num_columns() or empty. If empty, all columns will be sorted in ascending order.

  • null_precedence – The desired order of null compared to other elements for each column. Size must be equal to input.num_columns() or empty. If empty, all columns will be sorted in null_order::BEFORE.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

A non-nullable column of elements containing the permuted row indices of input if it were sorted

bool is_sorted(cudf::table_view const &table, std::vector<order> const &column_order, std::vector<null_order> const &null_precedence, rmm::cuda_stream_view stream = cudf::get_default_stream())#

Checks whether the rows of a table are sorted in a lexicographical order.

Parameters:
  • table – Table whose rows need to be compared for ordering

  • column_order – The expected sort order for each column. Size must be equal to in.num_columns() or empty. If empty, it is expected all columns are in ascending order.

  • null_precedence – The desired order of null compared to other elements for each column. Size must be equal to input.num_columns() or empty. If empty, null_order::BEFORE is assumed for all columns.

  • stream – CUDA stream used for device memory operations and kernel launches

Returns:

true if sorted as expected, false if not

std::unique_ptr<table> sort(table_view const &input, std::vector<order> const &column_order = {}, std::vector<null_order> const &null_precedence = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Performs a lexicographic sort of the rows of a table.

Parameters:
  • input – The table to sort

  • column_order – The desired order for each column. Size must be equal to input.num_columns() or empty. If empty, all columns are sorted in ascending order.

  • null_precedence – The desired order of a null element compared to other elements for each column in input. Size must be equal to input.num_columns() or empty. If empty, all columns will be sorted with null_order::BEFORE.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table’s device memory

Returns:

New table containing the desired sorted order of input

std::unique_ptr<table> stable_sort(table_view const &input, std::vector<order> const &column_order = {}, std::vector<null_order> const &null_precedence = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Performs a stable lexicographic sort of the rows of a table.

Performs a lexicographic sort of the rows of a table.

Parameters:
  • input – The table to sort

  • column_order – The desired order for each column. Size must be equal to input.num_columns() or empty. If empty, all columns are sorted in ascending order.

  • null_precedence – The desired order of a null element compared to other elements for each column in input. Size must be equal to input.num_columns() or empty. If empty, all columns will be sorted with null_order::BEFORE.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table’s device memory

Returns:

New table containing the desired sorted order of input

std::unique_ptr<table> sort_by_key(table_view const &values, table_view const &keys, std::vector<order> const &column_order = {}, std::vector<null_order> const &null_precedence = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Performs a key-value sort.

Creates a new table that reorders the rows of values according to the lexicographic ordering of the rows of keys.

Throws:

cudf::logic_error – if values.num_rows() != keys.num_rows().

Parameters:
  • values – The table to reorder

  • keys – The table that determines the ordering

  • column_order – The desired order for each column in keys. Size must be equal to keys.num_columns() or empty. If empty, all columns are sorted in ascending order.

  • null_precedence – The desired order of a null element compared to other elements for each column in keys. Size must be equal to keys.num_columns() or empty. If empty, all columns will be sorted with null_order::BEFORE.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table’s device memory

Returns:

The reordering of values determined by the lexicographic order of the rows of keys.

std::unique_ptr<table> stable_sort_by_key(table_view const &values, table_view const &keys, std::vector<order> const &column_order = {}, std::vector<null_order> const &null_precedence = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Performs a key-value stable sort.

Performs a key-value sort. Creates a new table that reorders the rows of values according to the lexicographic ordering of the rows of keys.

Throws:

cudf::logic_error – if values.num_rows() != keys.num_rows().

Parameters:
  • values – The table to reorder

  • keys – The table that determines the ordering

  • column_order – The desired order for each column in keys. Size must be equal to keys.num_columns() or empty. If empty, all columns are sorted in ascending order.

  • null_precedence – The desired order of a null element compared to other elements for each column in keys. Size must be equal to keys.num_columns() or empty. If empty, all columns will be sorted with null_order::BEFORE.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table’s device memory

Returns:

The reordering of values determined by the lexicographic order of the rows of keys.

std::unique_ptr<column> rank(column_view const &input, rank_method method, order column_order, null_policy null_handling, null_order null_precedence, bool percentage, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Computes the ranks of input column in sorted order.

Rank indicate the position of each element in the sorted column and rank value starts from 1.

input = { 3, 4, 5, 4, 1, 2}
Result for different rank_method are
FIRST    = {3, 4, 6, 5, 1, 2}
AVERAGE  = {3, 4.5, 6, 4.5, 1, 2}
MIN      = {3, 4, 6, 4, 1, 2}
MAX      = {3, 5, 6, 5, 1, 2}
DENSE    = {3, 4, 5, 4, 1, 2}
Parameters:
  • input – The column to rank

  • method – The ranking method used for tie breaking (same values)

  • column_order – The desired sort order for ranking

  • null_handling – flag to include nulls during ranking. If nulls are not included, corresponding rank will be null.

  • null_precedence – The desired order of null compared to other elements for column

  • percentage – flag to convert ranks to percentage in range (0,1]

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

A column of containing the rank of the each element of the column of input. The output column type will be size_typecolumn by default or else double when method=rank_method::AVERAGE or percentage=True

std::unique_ptr<column> segmented_sorted_order(table_view const &keys, column_view const &segment_offsets, std::vector<order> const &column_order = {}, std::vector<null_order> const &null_precedence = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Returns sorted order after sorting each segment in the table.

If segment_offsets contains values larger than the number of rows, the behavior is undefined.

Example:
keys = { {9, 8, 7, 6, 5, 4, 3, 2, 1, 0} }
offsets = {0, 3, 7, 10}
result = cudf::segmented_sorted_order(keys, offsets);
result is { 2,1,0, 6,5,4,3, 9,8,7 }

If segment_offsets is empty or contains a single index, no values are sorted and the result is a sequence of integers from 0 to keys.size()-1.

The segment_offsets are not required to include all indices. Any indices outside the specified segments will not be sorted.

Example: (offsets do not cover all indices)
keys = { {9, 8, 7, 6, 5, 4, 3, 2, 1, 0} }
offsets = {3, 7}
result = cudf::segmented_sorted_order(keys, offsets);
result is { 0,1,2, 6,5,4,3, 7,8,9 }
Throws:

cudf::logic_error – if segment_offsets is not size_type column.

Parameters:
  • keys – The table that determines the ordering of elements in each segment

  • segment_offsets – The column of size_type type containing start offset index for each contiguous segment.

  • column_order – The desired order for each column in keys. Size must be equal to keys.num_columns() or empty. If empty, all columns are sorted in ascending order.

  • null_precedence – The desired order of a null element compared to other elements for each column in keys. Size must be equal to keys.num_columns() or empty. If empty, all columns will be sorted with null_order::BEFORE.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource to allocate any returned objects

Returns:

sorted order of the segment sorted table

std::unique_ptr<column> stable_segmented_sorted_order(table_view const &keys, column_view const &segment_offsets, std::vector<order> const &column_order = {}, std::vector<null_order> const &null_precedence = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Returns sorted order after stably sorting each segment in the table.

Returns sorted order after sorting each segment in the table. If segment_offsets contains values larger than the number of rows, the behavior is undefined.

Example:
keys = { {9, 8, 7, 6, 5, 4, 3, 2, 1, 0} }
offsets = {0, 3, 7, 10}
result = cudf::segmented_sorted_order(keys, offsets);
result is { 2,1,0, 6,5,4,3, 9,8,7 }

If segment_offsets is empty or contains a single index, no values are sorted and the result is a sequence of integers from 0 to keys.size()-1.

The segment_offsets are not required to include all indices. Any indices outside the specified segments will not be sorted.

Example: (offsets do not cover all indices)
keys = { {9, 8, 7, 6, 5, 4, 3, 2, 1, 0} }
offsets = {3, 7}
result = cudf::segmented_sorted_order(keys, offsets);
result is { 0,1,2, 6,5,4,3, 7,8,9 }
Throws:

cudf::logic_error – if segment_offsets is not size_type column.

Parameters:
  • keys – The table that determines the ordering of elements in each segment

  • segment_offsets – The column of size_type type containing start offset index for each contiguous segment.

  • column_order – The desired order for each column in keys. Size must be equal to keys.num_columns() or empty. If empty, all columns are sorted in ascending order.

  • null_precedence – The desired order of a null element compared to other elements for each column in keys. Size must be equal to keys.num_columns() or empty. If empty, all columns will be sorted with null_order::BEFORE.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource to allocate any returned objects

Returns:

sorted order of the segment sorted table

std::unique_ptr<table> segmented_sort_by_key(table_view const &values, table_view const &keys, column_view const &segment_offsets, std::vector<order> const &column_order = {}, std::vector<null_order> const &null_precedence = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Performs a lexicographic segmented sort of a table.

If segment_offsets contains values larger than the number of rows, the behavior is undefined.

Example:
keys = { {9, 8, 7, 6, 5, 4, 3, 2, 1, 0} }
values = { {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'} }
offsets = {0, 3, 7, 10}
result = cudf::segmented_sort_by_key(keys, values, offsets);
result is { 'c','b','a', 'g','f','e','d', 'j','i','h' }

If segment_offsets is empty or contains a single index, no values are sorted and the result is a copy of the values.

The segment_offsets are not required to include all indices. Any indices outside the specified segments will not be sorted.

Example: (offsets do not cover all indices)
keys = { {9, 8, 7, 6, 5, 4, 3, 2, 1, 0} }
values = { {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'} }
offsets = {3, 7}
result = cudf::segmented_sort_by_key(keys, values, offsets);
result is { 'a','b','c', 'g','f','e','d', 'h','i','j' }
Throws:
Parameters:
  • values – The table to reorder

  • keys – The table that determines the ordering of elements in each segment

  • segment_offsets – The column of size_type type containing start offset index for each contiguous segment.

  • column_order – The desired order for each column in keys. Size must be equal to keys.num_columns() or empty. If empty, all columns are sorted in ascending order.

  • null_precedence – The desired order of a null element compared to other elements for each column in keys. Size must be equal to keys.num_columns() or empty. If empty, all columns will be sorted with null_order::BEFORE.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource to allocate any returned objects

Returns:

table with elements in each segment sorted

std::unique_ptr<table> stable_segmented_sort_by_key(table_view const &values, table_view const &keys, column_view const &segment_offsets, std::vector<order> const &column_order = {}, std::vector<null_order> const &null_precedence = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Performs a stably lexicographic segmented sort of a table.

Performs a lexicographic segmented sort of a table. If segment_offsets contains values larger than the number of rows, the behavior is undefined.

Example:
keys = { {9, 8, 7, 6, 5, 4, 3, 2, 1, 0} }
values = { {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'} }
offsets = {0, 3, 7, 10}
result = cudf::segmented_sort_by_key(keys, values, offsets);
result is { 'c','b','a', 'g','f','e','d', 'j','i','h' }

If segment_offsets is empty or contains a single index, no values are sorted and the result is a copy of the values.

The segment_offsets are not required to include all indices. Any indices outside the specified segments will not be sorted.

Example: (offsets do not cover all indices)
keys = { {9, 8, 7, 6, 5, 4, 3, 2, 1, 0} }
values = { {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'} }
offsets = {3, 7}
result = cudf::segmented_sort_by_key(keys, values, offsets);
result is { 'a','b','c', 'g','f','e','d', 'h','i','j' }
Throws:
Parameters:
  • values – The table to reorder

  • keys – The table that determines the ordering of elements in each segment

  • segment_offsets – The column of size_type type containing start offset index for each contiguous segment.

  • column_order – The desired order for each column in keys. Size must be equal to keys.num_columns() or empty. If empty, all columns are sorted in ascending order.

  • null_precedence – The desired order of a null element compared to other elements for each column in keys. Size must be equal to keys.num_columns() or empty. If empty, all columns will be sorted with null_order::BEFORE.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource to allocate any returned objects

Returns:

table with elements in each segment sorted