Interop Arrow#

group interop_arrow

Functions

std::shared_ptr<arrow::Table> to_arrow(table_view input, std::vector<column_metadata> const &metadata = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), arrow::MemoryPool *ar_mr = arrow::default_memory_pool())#

Create arrow::Table from cudf table input

Converts the cudf::table_view to arrow::Table with the provided metadata column_names.

Note

For decimals, since the precision is not stored for them in libcudf, it will be converted to an Arrow decimal128 that has the widest-precision the cudf decimal type supports. For example, numeric::decimal32 will be converted to Arrow decimal128 of the precision 9 which is the maximum precision for 32-bit types. Similarly, numeric::decimal128 will be converted to Arrow decimal128 of the precision 38.

Throws:

cudf::logic_error – if column_names size doesn’t match with number of columns.

Parameters:
  • inputtable_view that needs to be converted to arrow Table

  • metadata – Contains hierarchy of names of columns and children

  • stream – CUDA stream used for device memory operations and kernel launches

  • ar_mr – arrow memory pool to allocate memory for arrow Table

Returns:

arrow Table generated from input

std::shared_ptr<arrow::Scalar> to_arrow(cudf::scalar const &input, column_metadata const &metadata = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), arrow::MemoryPool *ar_mr = arrow::default_memory_pool())#

Create arrow::Scalar from cudf scalar input

Converts the cudf::scalar to arrow::Scalar.

Note

For decimals, since the precision is not stored for them in libcudf, it will be converted to an Arrow decimal128 that has the widest-precision the cudf decimal type supports. For example, numeric::decimal32 will be converted to Arrow decimal128 of the precision 9 which is the maximum precision for 32-bit types. Similarly, numeric::decimal128 will be converted to Arrow decimal128 of the precision 38.

Parameters:
  • input – scalar that needs to be converted to arrow Scalar

  • metadata – Contains hierarchy of names of columns and children

  • stream – CUDA stream used for device memory operations and kernel launches

  • ar_mr – arrow memory pool to allocate memory for arrow Scalar

Returns:

arrow Scalar generated from input

std::unique_ptr<table> from_arrow(arrow::Table const &input, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::mr::device_memory_resource *mr = rmm::mr::get_current_device_resource())#

Create cudf::table from given arrow Table input.

Parameters:
  • input – arrow:Table that needs to be converted to cudf::table

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate cudf::table

Returns:

cudf table generated from given arrow Table

std::unique_ptr<cudf::scalar> from_arrow(arrow::Scalar const &input, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::mr::device_memory_resource *mr = rmm::mr::get_current_device_resource())#

Create cudf::scalar from given arrow Scalar input.

Parameters:
  • inputarrow::Scalar that needs to be converted to cudf::scalar

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate cudf::scalar

Returns:

cudf scalar generated from given arrow Scalar

struct column_metadata#
#include <interop.hpp>

Detailed metadata information for arrow array.

As of now this contains only name in the hierarchy of children of cudf column, but in future this can be updated as per requirement.

Public Functions

inline column_metadata(std::string const &_name)#

Construct a new column metadata object.

Parameters:

_name – Name of the column

Public Members

std::string name#

Name of the column.

std::vector<column_metadata> children_meta#

Metadata of children of the column.