Files | Classes | Typedefs | Functions

Files

file  interop.hpp
 

Classes

struct  cudf::column_metadata
 Detailed metadata information for arrow array. More...
 
struct  cudf::custom_view_deleter< ViewType >
 functor for a custom deleter to a unique_ptr of table_view More...
 

Typedefs

using cudf::unique_schema_t = std::unique_ptr< ArrowSchema, void(*)(ArrowSchema *)>
 typedef for a unique_ptr to an ArrowSchema with custom deleter
 
using cudf::unique_device_array_t = std::unique_ptr< ArrowDeviceArray, void(*)(ArrowDeviceArray *)>
 typedef for a unique_ptr to an ArrowDeviceArray with a custom deleter
 
using cudf::owned_columns_t = std::vector< std::unique_ptr< cudf::column > >
 typedef for a vector of owning columns, used for conversion from ArrowDeviceArray
 
using cudf::unique_table_view_t = std::unique_ptr< cudf::table_view, custom_view_deleter< cudf::table_view > >
 typedef for a unique_ptr to a cudf::table_view with custom deleter
 
using cudf::unique_column_view_t = std::unique_ptr< cudf::column_view, custom_view_deleter< cudf::column_view > >
 typedef for a unique_ptr to a cudf::column_view with custom deleter
 

Functions

std::shared_ptr< arrow::Table > cudf::to_arrow (table_view input, std::vector< column_metadata > const &metadata={}, rmm::cuda_stream_view stream=cudf::get_default_stream(), arrow::MemoryPool *ar_mr=arrow::default_memory_pool())
 Create arrow::Table from cudf table input More...
 
std::shared_ptr< arrow::Scalar > cudf::to_arrow (cudf::scalar const &input, column_metadata const &metadata={}, rmm::cuda_stream_view stream=cudf::get_default_stream(), arrow::MemoryPool *ar_mr=arrow::default_memory_pool())
 Create arrow::Scalar from cudf scalar input More...
 
unique_schema_t cudf::to_arrow_schema (cudf::table_view const &input, cudf::host_span< column_metadata const > metadata)
 Create ArrowSchema from cudf table and metadata. More...
 
unique_device_array_t cudf::to_arrow_device (cudf::table &&table, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource())
 Create ArrowDeviceArray from cudf table and metadata. More...
 
unique_device_array_t cudf::to_arrow_device (cudf::column &&col, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource())
 Create ArrowDeviceArray from cudf column and metadata. More...
 
unique_device_array_t cudf::to_arrow_device (cudf::table_view const &table, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource())
 Create ArrowDeviceArray from a table view. More...
 
unique_device_array_t cudf::to_arrow_device (cudf::column_view const &col, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource())
 Create ArrowDeviceArray from a column view. More...
 
std::unique_ptr< tablecudf::from_arrow (arrow::Table const &input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource())
 Create cudf::table from given arrow Table input. More...
 
std::unique_ptr< cudf::scalarcudf::from_arrow (arrow::Scalar const &input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource())
 Create cudf::scalar from given arrow Scalar input. More...
 
unique_table_view_t cudf::from_arrow_device (ArrowSchema const *schema, ArrowDeviceArray const *input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Create cudf::table_view from given ArrowDeviceArray and ArrowSchema More...
 
unique_column_view_t cudf::from_arrow_device_column (ArrowSchema const *schema, ArrowDeviceArray const *input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Create cudf::column_view from given ArrowDeviceArray and ArrowSchema More...
 

Detailed Description

Function Documentation

◆ from_arrow() [1/2]

std::unique_ptr<cudf::scalar> cudf::from_arrow ( arrow::Scalar const &  input,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = rmm::mr::get_current_device_resource() 
)

Create cudf::scalar from given arrow Scalar input.

Parameters
inputarrow::Scalar that needs to be converted to cudf::scalar
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate cudf::scalar
Returns
cudf scalar generated from given arrow Scalar

◆ from_arrow() [2/2]

std::unique_ptr<table> cudf::from_arrow ( arrow::Table const &  input,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = rmm::mr::get_current_device_resource() 
)

Create cudf::table from given arrow Table input.

Parameters
inputarrow:Table that needs to be converted to cudf::table
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate cudf::table
Returns
cudf table generated from given arrow Table

◆ from_arrow_device()

unique_table_view_t cudf::from_arrow_device ( ArrowSchema const *  schema,
ArrowDeviceArray const *  input,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource *  mr = rmm::mr::get_current_device_resource() 
)

Create cudf::table_view from given ArrowDeviceArray and ArrowSchema

Constructs a non-owning cudf::table_view using ArrowDeviceArray and ArrowSchema, data must be accessible to the CUDA device. Because the resulting cudf::table_view will not own the data, the ArrowDeviceArray must be kept alive for the lifetime of the result. It is the responsibility of callers to ensure they call the release callback on the ArrowDeviceArray after it is no longer needed, and that the cudf::table_view is not accessed after this happens.

Exceptions
cudf::logic_errorif device_type is not ARROW_DEVICE_CUDA, ARROW_DEVICE_CUDA_HOST or ARROW_DEVICE_CUDA_MANAGED
cudf::data_type_errorif the input array is not a struct array, non-struct arrays should be passed to from_arrow_device_column instead.
cudf::data_type_errorif the input arrow data type is not supported.

Each child of the input struct will be the columns of the resulting table_view.

Note
The custom deleter used for the unique_ptr to the table_view maintains ownership over any memory which is allocated, such as converting boolean columns from the bitmap used by Arrow to the 1-byte per value for cudf.
If the input ArrowDeviceArray contained a non-null sync_event it is assumed to be a cudaEvent_t* and the passed in stream will have cudaStreamWaitEvent called on it with the event. This function, however, will not explicitly synchronize on the stream.
Parameters
schemaArrowSchema pointer to object describing the type of the device array
inputArrowDeviceArray pointer to object owning the Arrow data
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to perform any allocations
Returns
cudf::table_view generated from given Arrow data

◆ from_arrow_device_column()

unique_column_view_t cudf::from_arrow_device_column ( ArrowSchema const *  schema,
ArrowDeviceArray const *  input,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource *  mr = rmm::mr::get_current_device_resource() 
)

Create cudf::column_view from given ArrowDeviceArray and ArrowSchema

Constructs a non-owning cudf::column_view using ArrowDeviceArray and ArrowSchema, data must be accessible to the CUDA device. Because the resulting cudf::column_view will not own the data, the ArrowDeviceArray must be kept alive for the lifetime of the result. It is the responsibility of callers to ensure they call the release callback on the ArrowDeviceArray after it is no longer needed, and that the cudf::column_view is not accessed after this happens.

Exceptions
cudf::logic_errorif device_type is not ARROW_DEVICE_CUDA, ARROW_DEVICE_CUDA_HOST or ARROW_DEVICE_CUDA_MANAGED
cudf::data_type_errorinput arrow data type is not supported.
Note
The custom deleter used for the unique_ptr to the table_view maintains ownership over any memory which is allocated, such as converting boolean columns from the bitmap used by Arrow to the 1-byte per value for cudf.
If the input ArrowDeviceArray contained a non-null sync_event it is assumed to be a cudaEvent_t* and the passed in stream will have cudaStreamWaitEvent called on it with the event. This function, however, will not explicitly synchronize on the stream.
Parameters
schemaArrowSchema pointer to object describing the type of the device array
inputArrowDeviceArray pointer to object owning the Arrow data
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to perform any allocations
Returns
cudf::column_view generated from given Arrow data

◆ to_arrow() [1/2]

std::shared_ptr<arrow::Scalar> cudf::to_arrow ( cudf::scalar const &  input,
column_metadata const &  metadata = {},
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
arrow::MemoryPool *  ar_mr = arrow::default_memory_pool() 
)

Create arrow::Scalar from cudf scalar input

Converts the cudf::scalar to arrow::Scalar.

Parameters
inputscalar that needs to be converted to arrow Scalar
metadataContains hierarchy of names of columns and children
streamCUDA stream used for device memory operations and kernel launches
ar_mrarrow memory pool to allocate memory for arrow Scalar
Returns
arrow Scalar generated from input
Note
For decimals, since the precision is not stored for them in libcudf, it will be converted to an Arrow decimal128 that has the widest-precision the cudf decimal type supports. For example, numeric::decimal32 will be converted to Arrow decimal128 of the precision 9 which is the maximum precision for 32-bit types. Similarly, numeric::decimal128 will be converted to Arrow decimal128 of the precision 38.

◆ to_arrow() [2/2]

std::shared_ptr<arrow::Table> cudf::to_arrow ( table_view  input,
std::vector< column_metadata > const &  metadata = {},
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
arrow::MemoryPool *  ar_mr = arrow::default_memory_pool() 
)

Create arrow::Table from cudf table input

Converts the cudf::table_view to arrow::Table with the provided metadata column_names.

Exceptions
cudf::logic_errorif column_names size doesn't match with number of columns.
Parameters
inputtable_view that needs to be converted to arrow Table
metadataContains hierarchy of names of columns and children
streamCUDA stream used for device memory operations and kernel launches
ar_mrarrow memory pool to allocate memory for arrow Table
Returns
arrow Table generated from input
Note
For decimals, since the precision is not stored for them in libcudf, it will be converted to an Arrow decimal128 that has the widest-precision the cudf decimal type supports. For example, numeric::decimal32 will be converted to Arrow decimal128 of the precision 9 which is the maximum precision for 32-bit types. Similarly, numeric::decimal128 will be converted to Arrow decimal128 of the precision 38.

◆ to_arrow_device() [1/4]

unique_device_array_t cudf::to_arrow_device ( cudf::column &&  col,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = rmm::mr::get_current_device_resource() 
)

Create ArrowDeviceArray from cudf column and metadata.

Populates the C struct ArrowDeviceArray without performing copies if possible. This maintains the data on the GPU device and gives ownership of the table and its buffers to the ArrowDeviceArray struct.

After calling this function, the release callback on the returned ArrowDeviceArray must be called to clean up the memory.

Note
For decimals, since the precision is not stored for them in libcudf it will be converted to an Arrow decimal128 with the widest-precision the cudf decimal type supports. For example, numeric::decimal32 will be converted to Arrow decimal128 of the precision 9 which is the maximum precision for 32-bit types. Similar, numeric::decimal128 will be converted to Arrow decimal128 of the precision 38.
Copies will be performed in the cases where cudf differs from Arrow such as in the representation of bools (Arrow uses a bitmap, cudf uses 1 byte per value).
Parameters
colInput column, ownership of the data will be moved to the result
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used for any allocations during conversion
Returns
ArrowDeviceArray which will have ownership of the GPU data

◆ to_arrow_device() [2/4]

unique_device_array_t cudf::to_arrow_device ( cudf::column_view const &  col,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = rmm::mr::get_current_device_resource() 
)

Create ArrowDeviceArray from a column view.

Populates the C struct ArrowDeviceArray performing copies only if necessary. This wraps the data on the GPU device and gives a view of the column data to the ArrowDeviceArray struct. If the caller frees the data referenced by the column_view, using the returned object results in undefined behavior.

After calling this function, the release callback on the returned ArrowDeviceArray must be called to clean up any memory created during conversion.

Note
For decimals, since the precision is not stored for them in libcudf it will be converted to an Arrow decimal128 with the widest-precision the cudf decimal type supports. For example, numeric::decimal32 will be converted to Arrow decimal128 of the precision 9 which is the maximum precision for 32-bit types. Similar, numeric::decimal128 will be converted to Arrow decimal128 of the precision 38.

Copies will be performed in the cases where cudf differs from Arrow:

  • BOOL8: Arrow uses a bitmap and cudf uses 1 byte per value
  • DECIMAL32 and DECIMAL64: Converted to Arrow decimal128
  • STRING: Arrow expects a single value int32 offset child array for empty strings columns
Parameters
colInput column
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used for any allocations during conversion
Returns
ArrowDeviceArray which will have ownership of any copied data

◆ to_arrow_device() [3/4]

unique_device_array_t cudf::to_arrow_device ( cudf::table &&  table,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = rmm::mr::get_current_device_resource() 
)

Create ArrowDeviceArray from cudf table and metadata.

Populates the C struct ArrowDeviceArray without performing copies if possible. This maintains the data on the GPU device and gives ownership of the table and its buffers to the ArrowDeviceArray struct.

After calling this function, the release callback on the returned ArrowDeviceArray must be called to clean up the memory.

Note
For decimals, since the precision is not stored for them in libcudf it will be converted to an Arrow decimal128 with the widest-precision the cudf decimal type supports. For example, numeric::decimal32 will be converted to Arrow decimal128 of the precision 9 which is the maximum precision for 32-bit types. Similarly, numeric::decimal128 will be converted to Arrow decimal128 of the precision 38.
Copies will be performed in the cases where cudf differs from Arrow such as in the representation of bools (Arrow uses a bitmap, cudf uses 1-byte per value).
Parameters
tableInput table, ownership of the data will be moved to the result
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used for any allocations during conversion
Returns
ArrowDeviceArray which will have ownership of the GPU data, consumer must call release

◆ to_arrow_device() [4/4]

unique_device_array_t cudf::to_arrow_device ( cudf::table_view const &  table,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = rmm::mr::get_current_device_resource() 
)

Create ArrowDeviceArray from a table view.

Populates the C struct ArrowDeviceArray performing copies only if necessary. This wraps the data on the GPU device and gives a view of the table data to the ArrowDeviceArray struct. If the caller frees the data referenced by the table_view, using the returned object results in undefined behavior.

After calling this function, the release callback on the returned ArrowDeviceArray must be called to clean up any memory created during conversion.

Note
For decimals, since the precision is not stored for them in libcudf it will be converted to an Arrow decimal128 with the widest-precision the cudf decimal type supports. For example, numeric::decimal32 will be converted to Arrow decimal128 of the precision 9 which is the maximum precision for 32-bit types. Similarly, numeric::decimal128 will be converted to Arrow decimal128 of the precision 38.

Copies will be performed in the cases where cudf differs from Arrow:

  • BOOL8: Arrow uses a bitmap and cudf uses 1 byte per value
  • DECIMAL32 and DECIMAL64: Converted to Arrow decimal128
  • STRING: Arrow expects a single value int32 offset child array for empty strings columns
Parameters
tableInput table
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used for any allocations during conversion
Returns
ArrowDeviceArray which will have ownership of any copied data

◆ to_arrow_schema()

unique_schema_t cudf::to_arrow_schema ( cudf::table_view const &  input,
cudf::host_span< column_metadata const >  metadata 
)

Create ArrowSchema from cudf table and metadata.

Populates and returns an ArrowSchema C struct using a table and metadata.

Note
For decimals, since the precision is not stored for them in libcudf, decimals will be converted to an Arrow decimal128 which has the widest precision that cudf decimal type supports. For example, numeric::decimal32 will be converted to Arrow decimal128 with the precision of 9 which is the maximum precision for 32-bit types. Similarly, numeric::decimal128 will be converted to Arrow decimal128 with the precision of 38.
Parameters
inputTable to create a schema from
metadataContains the hierarchy of names of columns and children
Returns
ArrowSchema generated from input