cuxfilter with multi-GPU using dask_cudf#

Dask-cuDF extends Dask where necessary to allow its DataFrame partitions to be processed by cuDF GPU DataFrames as opposed to Pandas DataFrames. For instance, when you call dask_cudf.read_csv(…), your cluster’s GPUs do the work of parsing the CSV file(s) with underlying cudf.read_csv().

When to use cuDF and Dask-cuDF#

If your workflow is fast enough on a single GPU or your data comfortably fits in memory on a single GPU, you would want to use cuDF. If you want to distribute your workflow across multiple GPUs, have more data than you can fit in memory on a single GPU, or want to analyze data spread across many files at once, you would want to use Dask-cuDF.

A very useful guide to using Dask-cudf can be found here

Cuxfilter with Dask-cudf#

Using cuxfilter with Dask-cudf is a very seamless experience, and passing in a dask_cudf.DataFrame object, instead of cudf.DataFrame object should just work, without any other modifications. The dask_cudf.DataFrame should however be initialized with it’s partitions set, before passing it the the cuxfilter.DataFrame.from_dataframe function.

For more information and examples, please visit the cuxfilter repository with dask_cudf notebooks

Currently Supported Charts#
Library	Chart type
bokeh	bar, line
datashader	scatter, scatter_geo, line, stacked_lines, heatmap, graph(note: edge rendering support is limited for now)
panel_widgets	range_slider, date_range_slider, float_slider, int_slider, drop_down, multi_select, card, number
custom	view_dataframe
deckgl	choropleth(3d and 2d)