Transformation Fill#

group transformation_fill

Functions

void fill_in_place(mutable_column_view &destination, size_type begin, size_type end, scalar const &value, rmm::cuda_stream_view stream = cudf::get_default_stream())#

Fills a range of elements in-place in a column with a scalar value.

Fills N elements of destination starting at begin with value, where N = (end - begin).

Overwrites the range of elements in destination indicated by the indices [begin, end) with value. Use the out-of-place fill function returning std::unique_ptr<column> for use cases requiring memory reallocation.

Throws:
  • cudf::logic_error – if memory reallocation is required (e.g. for variable width types).

  • cudf::logic_error – for invalid range (if begin < 0, begin > end, or end > destination.size()).

  • cudf::logic_error – if destination and value have different types.

  • cudf::logic_error – if value is invalid but destination is not nullable.

Parameters:
  • destination – The preallocated column to fill into

  • begin – The starting index of the fill range (inclusive)

  • end – The index of the last element in the fill range (exclusive)

  • value – The scalar value to fill

  • stream – CUDA stream used for device memory operations and kernel launches

std::unique_ptr<column> fill(column_view const &input, size_type begin, size_type end, scalar const &value, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Fills a range of elements in a column out-of-place with a scalar value.

Creates a new column as-if an in-place fill was performed into input; i.e. it is as if a copy of input was created first and then the elements indicated by the indices [begin, end) were overwritten by value.

Throws:
  • cudf::logic_error – for invalid range (if begin < 0, begin > end, or end > destination.size()).

  • cudf::logic_error – if destination and value have different types.

Parameters:
  • input – The input column used to create a new column. The new column is created by replacing the values of input in the specified range with value.

  • begin – The starting index of the fill range (inclusive)

  • end – The index of the last element in the fill range (exclusive)

  • value – The scalar value to fill

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

The result output column

std::unique_ptr<table> repeat(table_view const &input_table, column_view const &count, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Repeat rows of a Table.

Creates a new table by repeating the rows of input_table. The number of repetitions of each element is defined by the value at the corresponding index of count Example:

in = [4,5,6]
count = [1,2,3]
return = [4,5,5,6,6,6]
count should not have null values; should not contain negative values; and the sum of count elements should not overflow the size_type’s limit. The behavior of this function is undefined if count has negative values or the sum overflows.

Throws:
Parameters:
  • input_table – Input table

  • count – Non-nullable column of an integral type

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table’s device memory

Returns:

The result table containing the repetitions

std::unique_ptr<table> repeat(table_view const &input_table, size_type count, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Repeat rows of a Table.

Creates a new table by repeating count times the rows of input_table. Example:

in = [4,5,6]
count = 2
return = [4,4,5,5,6,6]

Throws:
  • cudf::logic_error – if count is negative.

  • std::overflow_error – if input_table.num_rows() * count overflows size_type.

Parameters:
  • input_table – Input table

  • count – Number of repetitions

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table’s device memory

Returns:

The result table containing the repetitions

std::unique_ptr<column> sequence(size_type size, scalar const &init, scalar const &step, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Fills a column with a sequence of value specified by an initial value and a step.

Creates a new column and fills with size values starting at init and incrementing by step, generating the sequence [ init, init+step, init+2*step, … init + (size - 1)*step]

size = 3
init = 0
step = 2
return = [0, 2, 4]

Throws:
Parameters:
  • size – Size of the output column

  • init – First value in the sequence

  • step – Increment value

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

The result column containing the generated sequence

std::unique_ptr<column> sequence(size_type size, scalar const &init, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Fills a column with a sequence of value specified by an initial value and a step of 1.

Creates a new column and fills with size values starting at init and incrementing by 1, generating the sequence [ init, init+1, init+2, … init + (size - 1)]

size = 3
init = 0
return = [0, 1, 2]

Throws:
Parameters:
  • size – Size of the output column

  • init – First value in the sequence

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

The result column containing the generated sequence

std::unique_ptr<cudf::column> calendrical_month_sequence(size_type size, scalar const &init, size_type months, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#

Generate a sequence of timestamps beginning at init and incrementing by months for each successive element, i.e., output[i] = init + i * months for i in [0, size).

If a given date is invalid, the date is scaled back to the last available day of that month.

Example:

size = 3
init = 2020-01-31 08:00:00
months = 1
return = [2020-01-31 08:00:00, 2020-02-29 08:00:00, 2020-03-31 08:00:00]

Throws:

cudf::logic_error – if input datatype is not a TIMESTAMP

Parameters:
  • size – Number of timestamps to generate

  • init – The initial timestamp

  • months – Months to increment

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

Timestamps column with sequences of months