Files | Classes | Typedefs | Functions

Files

file  interop.hpp
 

Classes

struct  cudf::column_metadata
 Detailed metadata information for arrow array. More...
 
struct  cudf::custom_view_deleter< ViewType >
 functor for a custom deleter to a unique_ptr of table_view More...
 

Typedefs

using cudf::unique_schema_t = std::unique_ptr< ArrowSchema, void(*)(ArrowSchema *)>
 typedef for a unique_ptr to an ArrowSchema with custom deleter
 
using cudf::unique_device_array_t = std::unique_ptr< ArrowDeviceArray, void(*)(ArrowDeviceArray *)>
 typedef for a unique_ptr to an ArrowDeviceArray with a custom deleter
 
using cudf::owned_columns_t = std::vector< std::unique_ptr< cudf::column > >
 typedef for a vector of owning columns, used for conversion from ArrowDeviceArray
 
using cudf::unique_table_view_t = std::unique_ptr< cudf::table_view, custom_view_deleter< cudf::table_view > >
 typedef for a unique_ptr to a cudf::table_view with custom deleter
 
using cudf::unique_column_view_t = std::unique_ptr< cudf::column_view, custom_view_deleter< cudf::column_view > >
 typedef for a unique_ptr to a cudf::column_view with custom deleter
 

Functions

unique_schema_t cudf::to_arrow_schema (cudf::table_view const &input, cudf::host_span< column_metadata const > metadata)
 Create ArrowSchema from cudf table and metadata. More...
 
unique_device_array_t cudf::to_arrow_device (cudf::table &&table, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Create ArrowDeviceArray from cudf table and metadata. More...
 
unique_device_array_t cudf::to_arrow_device (cudf::column &&col, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Create ArrowDeviceArray from cudf column and metadata. More...
 
unique_device_array_t cudf::to_arrow_device (cudf::table_view const &table, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Create ArrowDeviceArray from a table view. More...
 
unique_device_array_t cudf::to_arrow_device (cudf::column_view const &col, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Create ArrowDeviceArray from a column view. More...
 
unique_device_array_t cudf::to_arrow_host (cudf::table_view const &table, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Copy table view data to host and create ArrowDeviceArray for it. More...
 
unique_device_array_t cudf::to_arrow_host (cudf::column_view const &col, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Copy column view data to host and create ArrowDeviceArray for it. More...
 
std::unique_ptr< cudf::tablecudf::from_arrow (ArrowSchema const *schema, ArrowArray const *input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Create cudf::table from given ArrowArray and ArrowSchema input. More...
 
std::unique_ptr< cudf::columncudf::from_arrow_column (ArrowSchema const *schema, ArrowArray const *input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Create cudf::column from a given ArrowArray and ArrowSchema input. More...
 
std::unique_ptr< tablecudf::from_arrow_host (ArrowSchema const *schema, ArrowDeviceArray const *input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Create cudf::table from given ArrowDeviceArray input. More...
 
std::unique_ptr< tablecudf::from_arrow_stream (ArrowArrayStream *input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Create cudf::table from given ArrowArrayStream input. More...
 
std::unique_ptr< columncudf::from_arrow_host_column (ArrowSchema const *schema, ArrowDeviceArray const *input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Create cudf::column from given ArrowDeviceArray input. More...
 
unique_table_view_t cudf::from_arrow_device (ArrowSchema const *schema, ArrowDeviceArray const *input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Create cudf::table_view from given ArrowDeviceArray and ArrowSchema More...
 
unique_column_view_t cudf::from_arrow_device_column (ArrowSchema const *schema, ArrowDeviceArray const *input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Create cudf::column_view from given ArrowDeviceArray and ArrowSchema More...
 

Detailed Description

Function Documentation

◆ from_arrow()

std::unique_ptr<cudf::table> cudf::from_arrow ( ArrowSchema const *  schema,
ArrowArray const *  input,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Create cudf::table from given ArrowArray and ArrowSchema input.

Exceptions
std::invalid_argumentif either schema or input are NULL
cudf::data_type_errorif the input array is not a struct array.

The conversion will not call release on the input Array.

Parameters
schemaArrowSchema pointer to describe the type of the data
inputArrowArray pointer that needs to be converted to cudf::table
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate cudf::table
Returns
cudf table generated from given arrow data

◆ from_arrow_column()

std::unique_ptr<cudf::column> cudf::from_arrow_column ( ArrowSchema const *  schema,
ArrowArray const *  input,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Create cudf::column from a given ArrowArray and ArrowSchema input.

Exceptions
std::invalid_argumentif either schema or input are NULL

The conversion will not call release on the input Array.

Parameters
schemaArrowSchema pointer to describe the type of the data
inputArrowArray pointer that needs to be converted to cudf::column
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate cudf::column
Returns
cudf column generated from given arrow data

◆ from_arrow_device()

unique_table_view_t cudf::from_arrow_device ( ArrowSchema const *  schema,
ArrowDeviceArray const *  input,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Create cudf::table_view from given ArrowDeviceArray and ArrowSchema

Constructs a non-owning cudf::table_view using ArrowDeviceArray and ArrowSchema, data must be accessible to the CUDA device. Because the resulting cudf::table_view will not own the data, the ArrowDeviceArray must be kept alive for the lifetime of the result. It is the responsibility of callers to ensure they call the release callback on the ArrowDeviceArray after it is no longer needed, and that the cudf::table_view is not accessed after this happens.

Exceptions
std::invalid_argumentif device_type is not ARROW_DEVICE_CUDA, ARROW_DEVICE_CUDA_HOST or ARROW_DEVICE_CUDA_MANAGED
cudf::data_type_errorif the input array is not a struct array, non-struct arrays should be passed to from_arrow_device_column instead.
cudf::data_type_errorif the input arrow data type is not supported.

Each child of the input struct will be the columns of the resulting table_view.

Note
The custom deleter used for the unique_ptr to the table_view maintains ownership over any memory which is allocated, such as converting boolean columns from the bitmap used by Arrow to the 1-byte per value for cudf.
If the input ArrowDeviceArray contained a non-null sync_event it is assumed to be a cudaEvent_t* and the passed in stream will have cudaStreamWaitEvent called on it with the event. This function, however, will not explicitly synchronize on the stream.
Parameters
schemaArrowSchema pointer to object describing the type of the device array
inputArrowDeviceArray pointer to object owning the Arrow data
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to perform any allocations
Returns
cudf::table_view generated from given Arrow data

◆ from_arrow_device_column()

unique_column_view_t cudf::from_arrow_device_column ( ArrowSchema const *  schema,
ArrowDeviceArray const *  input,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Create cudf::column_view from given ArrowDeviceArray and ArrowSchema

Constructs a non-owning cudf::column_view using ArrowDeviceArray and ArrowSchema, data must be accessible to the CUDA device. Because the resulting cudf::column_view will not own the data, the ArrowDeviceArray must be kept alive for the lifetime of the result. It is the responsibility of callers to ensure they call the release callback on the ArrowDeviceArray after it is no longer needed, and that the cudf::column_view is not accessed after this happens.

Exceptions
std::invalid_argumentif device_type is not ARROW_DEVICE_CUDA, ARROW_DEVICE_CUDA_HOST or ARROW_DEVICE_CUDA_MANAGED
cudf::data_type_errorinput arrow data type is not supported.
Note
The custom deleter used for the unique_ptr to the table_view maintains ownership over any memory which is allocated, such as converting boolean columns from the bitmap used by Arrow to the 1-byte per value for cudf.
If the input ArrowDeviceArray contained a non-null sync_event it is assumed to be a cudaEvent_t* and the passed in stream will have cudaStreamWaitEvent called on it with the event. This function, however, will not explicitly synchronize on the stream.
Parameters
schemaArrowSchema pointer to object describing the type of the device array
inputArrowDeviceArray pointer to object owning the Arrow data
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to perform any allocations
Returns
cudf::column_view generated from given Arrow data

◆ from_arrow_host()

std::unique_ptr<table> cudf::from_arrow_host ( ArrowSchema const *  schema,
ArrowDeviceArray const *  input,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Create cudf::table from given ArrowDeviceArray input.

Exceptions
std::invalid_argumentif either schema or input are NULL
std::invalid_argumentif the device_type is not ARROW_DEVICE_CPU
cudf::data_type_errorif the input array is not a struct array, non-struct arrays should be passed to from_arrow_host_column instead.

The conversion will not call release on the input Array.

Parameters
schemaArrowSchema pointer to describe the type of the data
inputArrowDeviceArray pointer to object owning the Arrow data
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to perform cuda allocation
Returns
cudf table generated from the given Arrow data

◆ from_arrow_host_column()

std::unique_ptr<column> cudf::from_arrow_host_column ( ArrowSchema const *  schema,
ArrowDeviceArray const *  input,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Create cudf::column from given ArrowDeviceArray input.

Exceptions
std::invalid_argumentif either schema or input are NULL
std::invalid_argumentif the device_type is not ARROW_DEVICE_CPU
cudf::data_type_errorif input arrow data type is not supported in cudf.

The conversion will not call release on the input Array.

Parameters
schemaArrowSchema pointer to describe the type of the data
inputArrowDeviceArray pointer to object owning the Arrow data
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to perform cuda allocation
Returns
cudf column generated from the given Arrow data

◆ from_arrow_stream()

std::unique_ptr<table> cudf::from_arrow_stream ( ArrowArrayStream *  input,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Create cudf::table from given ArrowArrayStream input.

Exceptions
std::invalid_argumentif input is NULL

The conversion WILL release the input ArrayArrayStream and its constituent arrays or schema since Arrow streams are not suitable for multiple reads.

Parameters
inputArrowArrayStream pointer to object that will produce ArrowArray data
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to perform cuda allocation
Returns
cudf table generated from the given Arrow data

◆ to_arrow_device() [1/4]

Create ArrowDeviceArray from cudf column and metadata.

Populates the C struct ArrowDeviceArray without performing copies if possible. This maintains the data on the GPU device and gives ownership of the table and its buffers to the ArrowDeviceArray struct.

After calling this function, the release callback on the returned ArrowDeviceArray must be called to clean up the memory.

Note
For decimals, since the precision is not stored for them in libcudf it will be converted to an Arrow decimal128 with the widest-precision the cudf decimal type supports. For example, numeric::decimal32 will be converted to Arrow decimal128 of the precision 9 which is the maximum precision for 32-bit types. Similar, numeric::decimal128 will be converted to Arrow decimal128 of the precision 38.
Copies will be performed in the cases where cudf differs from Arrow such as in the representation of bools (Arrow uses a bitmap, cudf uses 1 byte per value).
Parameters
colInput column, ownership of the data will be moved to the result
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used for any allocations during conversion
Returns
ArrowDeviceArray which will have ownership of the GPU data

◆ to_arrow_device() [2/4]

Create ArrowDeviceArray from a column view.

Populates the C struct ArrowDeviceArray performing copies only if necessary. This wraps the data on the GPU device and gives a view of the column data to the ArrowDeviceArray struct. If the caller frees the data referenced by the column_view, using the returned object results in undefined behavior.

After calling this function, the release callback on the returned ArrowDeviceArray must be called to clean up any memory created during conversion.

Note
For decimals, since the precision is not stored for them in libcudf it will be converted to an Arrow decimal128 with the widest-precision the cudf decimal type supports. For example, numeric::decimal32 will be converted to Arrow decimal128 of the precision 9 which is the maximum precision for 32-bit types. Similar, numeric::decimal128 will be converted to Arrow decimal128 of the precision 38.

Copies will be performed in the cases where cudf differs from Arrow:

  • BOOL8: Arrow uses a bitmap and cudf uses 1 byte per value
  • DECIMAL32 and DECIMAL64: Converted to Arrow decimal128
  • STRING: Arrow expects a single value int32 offset child array for empty strings columns
Parameters
colInput column
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used for any allocations during conversion
Returns
ArrowDeviceArray which will have ownership of any copied data

◆ to_arrow_device() [3/4]

Create ArrowDeviceArray from cudf table and metadata.

Populates the C struct ArrowDeviceArray without performing copies if possible. This maintains the data on the GPU device and gives ownership of the table and its buffers to the ArrowDeviceArray struct.

After calling this function, the release callback on the returned ArrowDeviceArray must be called to clean up the memory.

Note
For decimals, since the precision is not stored for them in libcudf it will be converted to an Arrow decimal128 with the widest-precision the cudf decimal type supports. For example, numeric::decimal32 will be converted to Arrow decimal128 of the precision 9 which is the maximum precision for 32-bit types. Similarly, numeric::decimal128 will be converted to Arrow decimal128 of the precision 38.
Copies will be performed in the cases where cudf differs from Arrow such as in the representation of bools (Arrow uses a bitmap, cudf uses 1-byte per value).
Parameters
tableInput table, ownership of the data will be moved to the result
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used for any allocations during conversion
Returns
ArrowDeviceArray which will have ownership of the GPU data, consumer must call release

◆ to_arrow_device() [4/4]

Create ArrowDeviceArray from a table view.

Populates the C struct ArrowDeviceArray performing copies only if necessary. This wraps the data on the GPU device and gives a view of the table data to the ArrowDeviceArray struct. If the caller frees the data referenced by the table_view, using the returned object results in undefined behavior.

After calling this function, the release callback on the returned ArrowDeviceArray must be called to clean up any memory created during conversion.

Note
For decimals, since the precision is not stored for them in libcudf it will be converted to an Arrow decimal128 with the widest-precision the cudf decimal type supports. For example, numeric::decimal32 will be converted to Arrow decimal128 of the precision 9 which is the maximum precision for 32-bit types. Similarly, numeric::decimal128 will be converted to Arrow decimal128 of the precision 38.

Copies will be performed in the cases where cudf differs from Arrow:

  • BOOL8: Arrow uses a bitmap and cudf uses 1 byte per value
  • DECIMAL32 and DECIMAL64: Converted to Arrow decimal128
  • STRING: Arrow expects a single value int32 offset child array for empty strings columns
Parameters
tableInput table
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used for any allocations during conversion
Returns
ArrowDeviceArray which will have ownership of any copied data

◆ to_arrow_host() [1/2]

Copy column view data to host and create ArrowDeviceArray for it.

Populates the C struct ArrowDeviceArray, copying the cudf data to the host. The returned ArrowDeviceArray will have a device_type of CPU and will have no ties to the memory referenced by the column view passed in. The deleter for the returned unique_ptr will call the release callback on the ArrowDeviceArray automatically.

Note
For decimals, since the precision is not stored for them in libcudf, it will be converted to an Arrow decimal128 that has the widest-precision the cudf decimal type supports. For example, numeric::decimal32 will be converted to Arrow decimal128 of the precision 9 which is the maximum precision for 32-bit types. Similarly, numeric::decimal128 will be converted to Arrow decimal128 of precision 38.
Parameters
colInput column
streamCUDA stream used for the device memory operations and kernel launches
mrDevice memory resource used for any allocations during conversion
Returns
ArrowDeviceArray generated from input column

◆ to_arrow_host() [2/2]

Copy table view data to host and create ArrowDeviceArray for it.

Populates the C struct ArrowDeviceArray, copying the cudf data to the host. The returned ArrowDeviceArray will have a device_type of CPU and will have no ties to the memory referenced by the table view passed in. The deleter for the returned unique_ptr will call the release callback on the ArrowDeviceArray automatically.

Note
For decimals, since the precision is not stored for them in libcudf, it will be converted to an Arrow decimal128 that has the widest-precision the cudf decimal type supports. For example, numeric::decimal32 will be converted to Arrow decimal128 of the precision 9 which is the maximum precision for 32-bit types. Similarly, numeric::decimal128 will be converted to Arrow decimal128 of precision 38.
Parameters
tableInput table
streamCUDA stream used for the device memory operations and kernel launches
mrDevice memory resource used for any allocations during conversion
Returns
ArrowDeviceArray generated from input table

◆ to_arrow_schema()

unique_schema_t cudf::to_arrow_schema ( cudf::table_view const &  input,
cudf::host_span< column_metadata const >  metadata 
)

Create ArrowSchema from cudf table and metadata.

Populates and returns an ArrowSchema C struct using a table and metadata.

Note
For decimals, since the precision is not stored for them in libcudf, decimals will be converted to an Arrow decimal128 which has the widest precision that cudf decimal type supports. For example, numeric::decimal32 will be converted to Arrow decimal128 with the precision of 9 which is the maximum precision for 32-bit types. Similarly, numeric::decimal128 will be converted to Arrow decimal128 with the precision of 38.
Parameters
inputTable to create a schema from
metadataContains the hierarchy of names of columns and children
Returns
ArrowSchema generated from input