pylibcudf documentation#
pylibcudf is a lightweight Cython interface to libcudf that provides near-zero overhead for GPU-accelerated data processing in Python.
It aims to provide minimal overhead interfaces to the C++ libcudf library, while integrating seamlessly with community protocols like __cuda_array_interface__
, and common libraries such as CuPy and Numba.
Both our zero-code pandas accelerator (cudf.pandas
) and our polars GPU execution engine (cudf.polars
) are built on top of pylibcudf.
Ex: Reading data from a parquet file
pylibcudf:
import pylibcudf as plc
source = plc.io.SourceInfo(["dataset.parquet"])
options = plc.io.parquet.ParquetReaderOptions.builder(source).build()
table = plc.io.parquet.read_parquet(options)
libcudf:
#include <cudf/io/parquet.hpp>
int main()
{
auto source = cudf::io::source_info("dataset.parquet");
auto options = cudf::io::parquet_reader_options::builder(source).build();
auto table = cudf::io::read_parquet(options);
}
Contents: