Column Sort#
- group column_sort
Enums
-
enum class rank_method : int32_t#
Tie-breaker method to use for ranking the column.
See also
cudf::make_rank_aggregation for more details.
Values:
-
enumerator FIRST#
stable sort order ranking (no ties)
-
enumerator AVERAGE#
mean of first in the group
-
enumerator MIN#
min of first in the group
-
enumerator MAX#
max of first in the group
-
enumerator DENSE#
rank always increases by 1 between groups
-
enumerator FIRST#
Functions
-
std::unique_ptr<column> sorted_order(table_view const &input, std::vector<order> const &column_order = {}, std::vector<null_order> const &null_precedence = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Computes the row indices that would produce
input
in a lexicographical sorted order.- Parameters:
input – The table to sort
column_order – The desired sort order for each column. Size must be equal to
input.num_columns()
or empty. If empty, all columns will be sorted in ascending order.null_precedence – The desired order of null compared to other elements for each column. Size must be equal to
input.num_columns()
or empty. If empty, all columns will be sorted innull_order::BEFORE
.stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
A non-nullable column of elements containing the permuted row indices of
input
if it were sorted
-
std::unique_ptr<column> stable_sorted_order(table_view const &input, std::vector<order> const &column_order = {}, std::vector<null_order> const &null_precedence = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Computes the row indices that would produce
input
in a stable lexicographical sorted order.The order of equivalent elements is guaranteed to be preserved.
Computes the row indices that would produce
input
in a lexicographical sorted order.- Parameters:
input – The table to sort
column_order – The desired sort order for each column. Size must be equal to
input.num_columns()
or empty. If empty, all columns will be sorted in ascending order.null_precedence – The desired order of null compared to other elements for each column. Size must be equal to
input.num_columns()
or empty. If empty, all columns will be sorted innull_order::BEFORE
.stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
A non-nullable column of elements containing the permuted row indices of
input
if it were sorted
-
bool is_sorted(cudf::table_view const &table, std::vector<order> const &column_order, std::vector<null_order> const &null_precedence, rmm::cuda_stream_view stream = cudf::get_default_stream())#
Checks whether the rows of a
table
are sorted in a lexicographical order.- Parameters:
table – Table whose rows need to be compared for ordering
column_order – The expected sort order for each column. Size must be equal to
in.num_columns()
or empty. If empty, it is expected all columns are in ascending order.null_precedence – The desired order of null compared to other elements for each column. Size must be equal to
input.num_columns()
or empty. If empty,null_order::BEFORE
is assumed for all columns.stream – CUDA stream used for device memory operations and kernel launches
- Returns:
true if sorted as expected, false if not
-
std::unique_ptr<table> sort(table_view const &input, std::vector<order> const &column_order = {}, std::vector<null_order> const &null_precedence = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Performs a lexicographic sort of the rows of a table.
- Parameters:
input – The table to sort
column_order – The desired order for each column. Size must be equal to
input.num_columns()
or empty. If empty, all columns are sorted in ascending order.null_precedence – The desired order of a null element compared to other elements for each column in
input
. Size must be equal toinput.num_columns()
or empty. If empty, all columns will be sorted withnull_order::BEFORE
.stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned table’s device memory
- Returns:
New table containing the desired sorted order of
input
-
std::unique_ptr<table> stable_sort(table_view const &input, std::vector<order> const &column_order = {}, std::vector<null_order> const &null_precedence = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Performs a stable lexicographic sort of the rows of a table.
Performs a lexicographic sort of the rows of a table.
- Parameters:
input – The table to sort
column_order – The desired order for each column. Size must be equal to
input.num_columns()
or empty. If empty, all columns are sorted in ascending order.null_precedence – The desired order of a null element compared to other elements for each column in
input
. Size must be equal toinput.num_columns()
or empty. If empty, all columns will be sorted withnull_order::BEFORE
.stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned table’s device memory
- Returns:
New table containing the desired sorted order of
input
-
std::unique_ptr<table> sort_by_key(table_view const &values, table_view const &keys, std::vector<order> const &column_order = {}, std::vector<null_order> const &null_precedence = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Performs a key-value sort.
Creates a new table that reorders the rows of
values
according to the lexicographic ordering of the rows ofkeys
.- Throws:
cudf::logic_error – if
values.num_rows() != keys.num_rows()
.- Parameters:
values – The table to reorder
keys – The table that determines the ordering
column_order – The desired order for each column in
keys
. Size must be equal tokeys.num_columns()
or empty. If empty, all columns are sorted in ascending order.null_precedence – The desired order of a null element compared to other elements for each column in
keys
. Size must be equal tokeys.num_columns()
or empty. If empty, all columns will be sorted withnull_order::BEFORE
.stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned table’s device memory
- Returns:
The reordering of
values
determined by the lexicographic order of the rows ofkeys
.
-
std::unique_ptr<table> stable_sort_by_key(table_view const &values, table_view const &keys, std::vector<order> const &column_order = {}, std::vector<null_order> const &null_precedence = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Performs a key-value stable sort.
Performs a key-value sort. Creates a new table that reorders the rows of
values
according to the lexicographic ordering of the rows ofkeys
.- Throws:
cudf::logic_error – if
values.num_rows() != keys.num_rows()
.- Parameters:
values – The table to reorder
keys – The table that determines the ordering
column_order – The desired order for each column in
keys
. Size must be equal tokeys.num_columns()
or empty. If empty, all columns are sorted in ascending order.null_precedence – The desired order of a null element compared to other elements for each column in
keys
. Size must be equal tokeys.num_columns()
or empty. If empty, all columns will be sorted withnull_order::BEFORE
.stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned table’s device memory
- Returns:
The reordering of
values
determined by the lexicographic order of the rows ofkeys
.
-
std::unique_ptr<column> rank(column_view const &input, rank_method method, order column_order, null_policy null_handling, null_order null_precedence, bool percentage, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Computes the ranks of input column in sorted order.
Rank indicate the position of each element in the sorted column and rank value starts from 1.
input = { 3, 4, 5, 4, 1, 2} Result for different rank_method are FIRST = {3, 4, 6, 5, 1, 2} AVERAGE = {3, 4.5, 6, 4.5, 1, 2} MIN = {3, 4, 6, 4, 1, 2} MAX = {3, 5, 6, 5, 1, 2} DENSE = {3, 4, 5, 4, 1, 2}
- Parameters:
input – The column to rank
method – The ranking method used for tie breaking (same values)
column_order – The desired sort order for ranking
null_handling – flag to include nulls during ranking. If nulls are not included, corresponding rank will be null.
null_precedence – The desired order of null compared to other elements for column
percentage – flag to convert ranks to percentage in range (0,1]
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
A column of containing the rank of the each element of the column of
input
. The output column type will besize_type
column by default or elsedouble
whenmethod=rank_method::AVERAGE
orpercentage=True
-
std::unique_ptr<column> segmented_sorted_order(table_view const &keys, column_view const &segment_offsets, std::vector<order> const &column_order = {}, std::vector<null_order> const &null_precedence = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns sorted order after sorting each segment in the table.
If segment_offsets contains values larger than the number of rows, the behavior is undefined.
Example: keys = { {9, 8, 7, 6, 5, 4, 3, 2, 1, 0} } offsets = {0, 3, 7, 10} result = cudf::segmented_sorted_order(keys, offsets); result is { 2,1,0, 6,5,4,3, 9,8,7 }
If segment_offsets is empty or contains a single index, no values are sorted and the result is a sequence of integers from 0 to keys.size()-1.
The segment_offsets are not required to include all indices. Any indices outside the specified segments will not be sorted.
Example: (offsets do not cover all indices) keys = { {9, 8, 7, 6, 5, 4, 3, 2, 1, 0} } offsets = {3, 7} result = cudf::segmented_sorted_order(keys, offsets); result is { 0,1,2, 6,5,4,3, 7,8,9 }
- Throws:
cudf::logic_error – if
segment_offsets
is notsize_type
column.- Parameters:
keys – The table that determines the ordering of elements in each segment
segment_offsets – The column of
size_type
type containing start offset index for each contiguous segment.column_order – The desired order for each column in
keys
. Size must be equal tokeys.num_columns()
or empty. If empty, all columns are sorted in ascending order.null_precedence – The desired order of a null element compared to other elements for each column in
keys
. Size must be equal tokeys.num_columns()
or empty. If empty, all columns will be sorted withnull_order::BEFORE
.stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource to allocate any returned objects
- Returns:
sorted order of the segment sorted table
-
std::unique_ptr<column> stable_segmented_sorted_order(table_view const &keys, column_view const &segment_offsets, std::vector<order> const &column_order = {}, std::vector<null_order> const &null_precedence = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Returns sorted order after stably sorting each segment in the table.
Returns sorted order after sorting each segment in the table. If segment_offsets contains values larger than the number of rows, the behavior is undefined.
Example: keys = { {9, 8, 7, 6, 5, 4, 3, 2, 1, 0} } offsets = {0, 3, 7, 10} result = cudf::segmented_sorted_order(keys, offsets); result is { 2,1,0, 6,5,4,3, 9,8,7 }
If segment_offsets is empty or contains a single index, no values are sorted and the result is a sequence of integers from 0 to keys.size()-1.
The segment_offsets are not required to include all indices. Any indices outside the specified segments will not be sorted.
Example: (offsets do not cover all indices) keys = { {9, 8, 7, 6, 5, 4, 3, 2, 1, 0} } offsets = {3, 7} result = cudf::segmented_sorted_order(keys, offsets); result is { 0,1,2, 6,5,4,3, 7,8,9 }
- Throws:
cudf::logic_error – if
segment_offsets
is notsize_type
column.- Parameters:
keys – The table that determines the ordering of elements in each segment
segment_offsets – The column of
size_type
type containing start offset index for each contiguous segment.column_order – The desired order for each column in
keys
. Size must be equal tokeys.num_columns()
or empty. If empty, all columns are sorted in ascending order.null_precedence – The desired order of a null element compared to other elements for each column in
keys
. Size must be equal tokeys.num_columns()
or empty. If empty, all columns will be sorted withnull_order::BEFORE
.stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource to allocate any returned objects
- Returns:
sorted order of the segment sorted table
-
std::unique_ptr<table> segmented_sort_by_key(table_view const &values, table_view const &keys, column_view const &segment_offsets, std::vector<order> const &column_order = {}, std::vector<null_order> const &null_precedence = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Performs a lexicographic segmented sort of a table.
If segment_offsets contains values larger than the number of rows, the behavior is undefined.
Example: keys = { {9, 8, 7, 6, 5, 4, 3, 2, 1, 0} } values = { {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'} } offsets = {0, 3, 7, 10} result = cudf::segmented_sort_by_key(keys, values, offsets); result is { 'c','b','a', 'g','f','e','d', 'j','i','h' }
If segment_offsets is empty or contains a single index, no values are sorted and the result is a copy of the values.
The segment_offsets are not required to include all indices. Any indices outside the specified segments will not be sorted.
Example: (offsets do not cover all indices) keys = { {9, 8, 7, 6, 5, 4, 3, 2, 1, 0} } values = { {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'} } offsets = {3, 7} result = cudf::segmented_sort_by_key(keys, values, offsets); result is { 'a','b','c', 'g','f','e','d', 'h','i','j' }
- Throws:
cudf::logic_error – if
values.num_rows() != keys.num_rows()
.cudf::logic_error – if
segment_offsets
is notsize_type
column.
- Parameters:
values – The table to reorder
keys – The table that determines the ordering of elements in each segment
segment_offsets – The column of
size_type
type containing start offset index for each contiguous segment.column_order – The desired order for each column in
keys
. Size must be equal tokeys.num_columns()
or empty. If empty, all columns are sorted in ascending order.null_precedence – The desired order of a null element compared to other elements for each column in
keys
. Size must be equal tokeys.num_columns()
or empty. If empty, all columns will be sorted withnull_order::BEFORE
.stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource to allocate any returned objects
- Returns:
table with elements in each segment sorted
-
std::unique_ptr<table> stable_segmented_sort_by_key(table_view const &values, table_view const &keys, column_view const &segment_offsets, std::vector<order> const &column_order = {}, std::vector<null_order> const &null_precedence = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Performs a stably lexicographic segmented sort of a table.
Performs a lexicographic segmented sort of a table. If segment_offsets contains values larger than the number of rows, the behavior is undefined.
Example: keys = { {9, 8, 7, 6, 5, 4, 3, 2, 1, 0} } values = { {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'} } offsets = {0, 3, 7, 10} result = cudf::segmented_sort_by_key(keys, values, offsets); result is { 'c','b','a', 'g','f','e','d', 'j','i','h' }
If segment_offsets is empty or contains a single index, no values are sorted and the result is a copy of the values.
The segment_offsets are not required to include all indices. Any indices outside the specified segments will not be sorted.
Example: (offsets do not cover all indices) keys = { {9, 8, 7, 6, 5, 4, 3, 2, 1, 0} } values = { {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'} } offsets = {3, 7} result = cudf::segmented_sort_by_key(keys, values, offsets); result is { 'a','b','c', 'g','f','e','d', 'h','i','j' }
- Throws:
cudf::logic_error – if
values.num_rows() != keys.num_rows()
.cudf::logic_error – if
segment_offsets
is notsize_type
column.
- Parameters:
values – The table to reorder
keys – The table that determines the ordering of elements in each segment
segment_offsets – The column of
size_type
type containing start offset index for each contiguous segment.column_order – The desired order for each column in
keys
. Size must be equal tokeys.num_columns()
or empty. If empty, all columns are sorted in ascending order.null_precedence – The desired order of a null element compared to other elements for each column in
keys
. Size must be equal tokeys.num_columns()
or empty. If empty, all columns will be sorted withnull_order::BEFORE
.stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource to allocate any returned objects
- Returns:
table with elements in each segment sorted
-
enum class rank_method : int32_t#