Files | |
file | interop.hpp |
Classes | |
struct | cudf::column_metadata |
Detailed metadata information for arrow array. More... | |
struct | cudf::custom_view_deleter< ViewType > |
functor for a custom deleter to a unique_ptr of table_view More... | |
Typedefs | |
using | cudf::unique_schema_t = std::unique_ptr< ArrowSchema, void(*)(ArrowSchema *)> |
typedef for a unique_ptr to an ArrowSchema with custom deleter | |
using | cudf::unique_device_array_t = std::unique_ptr< ArrowDeviceArray, void(*)(ArrowDeviceArray *)> |
typedef for a unique_ptr to an ArrowDeviceArray with a custom deleter | |
using | cudf::owned_columns_t = std::vector< std::unique_ptr< cudf::column > > |
typedef for a vector of owning columns, used for conversion from ArrowDeviceArray | |
using | cudf::unique_table_view_t = std::unique_ptr< cudf::table_view, custom_view_deleter< cudf::table_view > > |
typedef for a unique_ptr to a cudf::table_view with custom deleter | |
using | cudf::unique_column_view_t = std::unique_ptr< cudf::column_view, custom_view_deleter< cudf::column_view > > |
typedef for a unique_ptr to a cudf::column_view with custom deleter | |
std::unique_ptr<cudf::table> cudf::from_arrow | ( | ArrowSchema const * | schema, |
ArrowArray const * | input, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Create cudf::table
from given ArrowArray and ArrowSchema input.
std::invalid_argument | if either schema or input are NULL |
cudf::data_type_error | if the input array is not a struct array. |
The conversion will not call release on the input Array.
schema | ArrowSchema pointer to describe the type of the data |
input | ArrowArray pointer that needs to be converted to cudf::table |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate cudf::table |
std::unique_ptr<cudf::column> cudf::from_arrow_column | ( | ArrowSchema const * | schema, |
ArrowArray const * | input, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Create cudf::column
from a given ArrowArray and ArrowSchema input.
std::invalid_argument | if either schema or input are NULL |
The conversion will not call release on the input Array.
schema | ArrowSchema pointer to describe the type of the data |
input | ArrowArray pointer that needs to be converted to cudf::column |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate cudf::column |
unique_table_view_t cudf::from_arrow_device | ( | ArrowSchema const * | schema, |
ArrowDeviceArray const * | input, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Create cudf::table_view
from given ArrowDeviceArray
and ArrowSchema
Constructs a non-owning cudf::table_view
using ArrowDeviceArray
and ArrowSchema
, data must be accessible to the CUDA device. Because the resulting cudf::table_view
will not own the data, the ArrowDeviceArray
must be kept alive for the lifetime of the result. It is the responsibility of callers to ensure they call the release callback on the ArrowDeviceArray
after it is no longer needed, and that the cudf::table_view
is not accessed after this happens.
std::invalid_argument | if device_type is not ARROW_DEVICE_CUDA , ARROW_DEVICE_CUDA_HOST or ARROW_DEVICE_CUDA_MANAGED |
cudf::data_type_error | if the input array is not a struct array, non-struct arrays should be passed to from_arrow_device_column instead. |
cudf::data_type_error | if the input arrow data type is not supported. |
Each child of the input struct will be the columns of the resulting table_view.
ArrowDeviceArray
contained a non-null sync_event it is assumed to be a cudaEvent_t*
and the passed in stream will have cudaStreamWaitEvent
called on it with the event. This function, however, will not explicitly synchronize on the stream.schema | ArrowSchema pointer to object describing the type of the device array |
input | ArrowDeviceArray pointer to object owning the Arrow data |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to perform any allocations |
cudf::table_view
generated from given Arrow data unique_column_view_t cudf::from_arrow_device_column | ( | ArrowSchema const * | schema, |
ArrowDeviceArray const * | input, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Create cudf::column_view
from given ArrowDeviceArray
and ArrowSchema
Constructs a non-owning cudf::column_view
using ArrowDeviceArray
and ArrowSchema
, data must be accessible to the CUDA device. Because the resulting cudf::column_view
will not own the data, the ArrowDeviceArray
must be kept alive for the lifetime of the result. It is the responsibility of callers to ensure they call the release callback on the ArrowDeviceArray
after it is no longer needed, and that the cudf::column_view
is not accessed after this happens.
std::invalid_argument | if device_type is not ARROW_DEVICE_CUDA , ARROW_DEVICE_CUDA_HOST or ARROW_DEVICE_CUDA_MANAGED |
cudf::data_type_error | input arrow data type is not supported. |
ArrowDeviceArray
contained a non-null sync_event it is assumed to be a cudaEvent_t*
and the passed in stream will have cudaStreamWaitEvent
called on it with the event. This function, however, will not explicitly synchronize on the stream.schema | ArrowSchema pointer to object describing the type of the device array |
input | ArrowDeviceArray pointer to object owning the Arrow data |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to perform any allocations |
cudf::column_view
generated from given Arrow data std::unique_ptr<table> cudf::from_arrow_host | ( | ArrowSchema const * | schema, |
ArrowDeviceArray const * | input, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Create cudf::table
from given ArrowDeviceArray input.
std::invalid_argument | if either schema or input are NULL |
std::invalid_argument | if the device_type is not ARROW_DEVICE_CPU |
cudf::data_type_error | if the input array is not a struct array, non-struct arrays should be passed to from_arrow_host_column instead. |
The conversion will not call release on the input Array.
schema | ArrowSchema pointer to describe the type of the data |
input | ArrowDeviceArray pointer to object owning the Arrow data |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to perform cuda allocation |
std::unique_ptr<column> cudf::from_arrow_host_column | ( | ArrowSchema const * | schema, |
ArrowDeviceArray const * | input, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Create cudf::column
from given ArrowDeviceArray input.
std::invalid_argument | if either schema or input are NULL |
std::invalid_argument | if the device_type is not ARROW_DEVICE_CPU |
cudf::data_type_error | if input arrow data type is not supported in cudf. |
The conversion will not call release on the input Array.
schema | ArrowSchema pointer to describe the type of the data |
input | ArrowDeviceArray pointer to object owning the Arrow data |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to perform cuda allocation |
std::unique_ptr<table> cudf::from_arrow_stream | ( | ArrowArrayStream * | input, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Create cudf::table
from given ArrowArrayStream input.
std::invalid_argument | if input is NULL |
The conversion WILL release the input ArrayArrayStream and its constituent arrays or schema since Arrow streams are not suitable for multiple reads.
input | ArrowArrayStream pointer to object that will produce ArrowArray data |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to perform cuda allocation |
unique_device_array_t cudf::to_arrow_device | ( | cudf::column && | col, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Create ArrowDeviceArray
from cudf column and metadata.
Populates the C struct ArrowDeviceArray without performing copies if possible. This maintains the data on the GPU device and gives ownership of the table and its buffers to the ArrowDeviceArray struct.
After calling this function, the release callback on the returned ArrowDeviceArray must be called to clean up the memory.
col | Input column, ownership of the data will be moved to the result |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used for any allocations during conversion |
unique_device_array_t cudf::to_arrow_device | ( | cudf::column_view const & | col, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Create ArrowDeviceArray
from a column view.
Populates the C struct ArrowDeviceArray performing copies only if necessary. This wraps the data on the GPU device and gives a view of the column data to the ArrowDeviceArray struct. If the caller frees the data referenced by the column_view, using the returned object results in undefined behavior.
After calling this function, the release callback on the returned ArrowDeviceArray must be called to clean up any memory created during conversion.
Copies will be performed in the cases where cudf differs from Arrow:
col | Input column |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used for any allocations during conversion |
unique_device_array_t cudf::to_arrow_device | ( | cudf::table && | table, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Create ArrowDeviceArray
from cudf table and metadata.
Populates the C struct ArrowDeviceArray without performing copies if possible. This maintains the data on the GPU device and gives ownership of the table and its buffers to the ArrowDeviceArray struct.
After calling this function, the release callback on the returned ArrowDeviceArray must be called to clean up the memory.
table | Input table, ownership of the data will be moved to the result |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used for any allocations during conversion |
unique_device_array_t cudf::to_arrow_device | ( | cudf::table_view const & | table, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Create ArrowDeviceArray
from a table view.
Populates the C struct ArrowDeviceArray performing copies only if necessary. This wraps the data on the GPU device and gives a view of the table data to the ArrowDeviceArray struct. If the caller frees the data referenced by the table_view, using the returned object results in undefined behavior.
After calling this function, the release callback on the returned ArrowDeviceArray must be called to clean up any memory created during conversion.
Copies will be performed in the cases where cudf differs from Arrow:
table | Input table |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used for any allocations during conversion |
unique_device_array_t cudf::to_arrow_host | ( | cudf::column_view const & | col, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Copy column view data to host and create ArrowDeviceArray
for it.
Populates the C struct ArrowDeviceArray, copying the cudf data to the host. The returned ArrowDeviceArray will have a device_type of CPU and will have no ties to the memory referenced by the column view passed in. The deleter for the returned unique_ptr will call the release callback on the ArrowDeviceArray automatically.
col | Input column |
stream | CUDA stream used for the device memory operations and kernel launches |
mr | Device memory resource used for any allocations during conversion |
unique_device_array_t cudf::to_arrow_host | ( | cudf::table_view const & | table, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Copy table view data to host and create ArrowDeviceArray
for it.
Populates the C struct ArrowDeviceArray, copying the cudf data to the host. The returned ArrowDeviceArray will have a device_type of CPU and will have no ties to the memory referenced by the table view passed in. The deleter for the returned unique_ptr will call the release callback on the ArrowDeviceArray automatically.
table | Input table |
stream | CUDA stream used for the device memory operations and kernel launches |
mr | Device memory resource used for any allocations during conversion |
unique_schema_t cudf::to_arrow_schema | ( | cudf::table_view const & | input, |
cudf::host_span< column_metadata const > | metadata | ||
) |
Create ArrowSchema from cudf table and metadata.
Populates and returns an ArrowSchema C struct using a table and metadata.
numeric::decimal32
will be converted to Arrow decimal128 with the precision of 9 which is the maximum precision for 32-bit types. Similarly, numeric::decimal128
will be converted to Arrow decimal128 with the precision of 38.input | Table to create a schema from |
metadata | Contains the hierarchy of names of columns and children |
input