Introduction#
cuML accelerates machine learning on GPUs. The library follows a couple of key principles, and understanding these will help you take full advantage cuML.
1. Where possible, match the scikit-learn API#
cuML estimators look and feel just like scikit-learn estimators. You
initialize them with key parameters, fit them with a fit
method,
then call predict
or transform
for inference.
import cuml.LinearRegression
model = cuml.LinearRegression()
model.fit(X_train, y)
y_prediction = model.predict(X_test)
You can find many more complete examples in the Introductory Notebook and in the cuML API documentation.
2. Accept flexible input types, return predictable output types#
cuML estimators can accept NumPy arrays, cuDF dataframes, cuPy arrays,
2d PyTorch tensors, and really any kind of standards-based Python
array input you can throw at them. This relies on the __array__
and __cuda_array_interface__
standards, widely used throughout the
PyData community.
By default, outputs will mirror the data type you provided. So, if you
fit a model with a NumPy array, the model.coef_
property
containing fitted coefficients will also be a NumPy array. If you fit
a model using cuDF’s GPU-based DataFrame and Series objects, the
model’s output properties will be cuDF objects. You can always
override this behavior and select a default datatype with the
memory_utils.set_global_output_type
function.
The RAPIDS Configurable Input and Output Types blog post goes into much more detail explaining this approach.
3. Be fast!#
cuML’s estimators rely on highly-optimized CUDA primitives and
algorithms within libcuml
. On a modern GPU, these can exceed the
performance of CPU-based equivalents by a factor of anything from 4x
(for a medium-sized linear regression) to over 1000x (for large-scale
tSNE dimensionality reduction). The cuml.benchmark module
provides an easy interface to benchmark your own hardware.
To maximize performance, keep in mind - a modern GPU can have over 5000 cores, so make sure you’re providing enough data to keep it busy! In many cases, performance advantages appear as the dataset grows.
Learn more#
To get started learning cuML, walk through the Introductory Notebook. Then try out some of the other notebook
examples in the notebooks
directory of the repository. Finally, do
a deeper dive with the cuML blogs.