Pickling cuML Models for Persistence

This notebook demonstrates simple pickling of both single-GPU and multi-GPU cuML models for persistence

[1]:
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

Single GPU Model Pickling

All single-GPU estimators are pickleable. The following example demonstrates the creation of a synthetic dataset, training, and pickling of the resulting model for storage. Trained single-GPU models can also be used to distribute the inference on a Dask cluster, which the Distributed Model Pickling section below demonstrates.

[2]:
from cuml.datasets import make_blobs

X, y = make_blobs(n_samples=50,
                  n_features=10,
                  centers=5,
                  cluster_std=0.4,
                  random_state=0)
[3]:
from cuml.cluster import KMeans

model = KMeans(n_clusters=5)

model.fit(X)
[3]:
KMeans(handle=<cuml.raft.common.handle.Handle object at 0x7fe104514c30>, n_clusters=5, max_iter=300, tol=0.0001, verbose=4, random_state=1, init='scalable-k-means++', n_init=1, oversampling_factor=2.0, max_samples_per_batch=32768, output_type='input')
[4]:
import pickle

pickle.dump(model, open("kmeans_model.pkl", "wb"))
[5]:
model = pickle.load(open("kmeans_model.pkl", "rb"))
[6]:
model.cluster_centers_
[6]:
array([[-5.7684636 ,  2.3276033 , -3.7457774 , -1.8541754 , -5.1695833 ,
         7.667088  ,  2.7118318 ,  8.495609  ,  1.7038484 ,  1.1884269 ],
       [ 4.647688  ,  8.37788   , -9.070581  ,  9.4593315 ,  8.450423  ,
        -1.0210547 ,  3.3920872 , -7.8629856 , -0.7527662 ,  0.48384118],
       [-2.9414437 ,  4.6401706 , -4.5027537 ,  2.2855108 ,  1.644645  ,
        -2.4937892 , -5.2241607 , -1.5499196 , -8.063638  ,  2.816936  ],
       [-4.271077  ,  5.561165  , -5.6640916 , -1.8229512 , -9.2925    ,
         0.73028314,  4.4586773 , -2.8876226 , -5.1257744 ,  9.694357  ],
       [ 5.5837417 , -4.1515303 ,  4.369667  , -3.0020504 ,  3.638897  ,
        -4.341912  , -3.318711  ,  6.503671  , -6.865036  , -1.0266498 ]],
      dtype=float32)

Distributed Model Pickling

The distributed estimator wrappers inside of the cuml.dask are not intended to be pickled directly. The Dask cuML estimators provide a function get_combined_model(), which returns the trained single-GPU model for pickling. The combined model can be used for inference on a single-GPU, and the ParallelPostFit wrapper from the Dask-ML library can be used to perform distributed inference on a Dask cluster.

[7]:
from dask.distributed import Client
from dask_cuda import LocalCUDACluster

cluster = LocalCUDACluster()
client = Client(cluster)
client
[7]:

Client

Cluster

  • Workers: 1
  • Cores: 1
  • Memory: 270.37 GB
[8]:
from cuml.dask.datasets import make_blobs

n_workers = len(client.scheduler_info()["workers"].keys())

X, y = make_blobs(n_samples=5000,
                  n_features=30,
                  centers=5,
                  cluster_std=0.4,
                  random_state=0,
                  n_parts=n_workers*5)

X = X.persist()
y = y.persist()
[9]:
from cuml.dask.cluster import KMeans

dist_model = KMeans(n_clusters=5)
[10]:
dist_model.fit(X)
[10]:
<cuml.dask.cluster.kmeans.KMeans at 0x7fe0fcad4b10>
[11]:
import pickle

single_gpu_model = dist_model.get_combined_model()
pickle.dump(single_gpu_model, open("kmeans_model.pkl", "wb"))
[12]:
single_gpu_model = pickle.load(open("kmeans_model.pkl", "rb"))
[13]:
single_gpu_model.cluster_centers_
[13]:
array([[ 4.809875  ,  8.422671  , -9.239022  ,  9.37914   ,  8.499881  ,
        -1.0592818 ,  3.3437855 , -7.802612  , -0.5946332 ,  0.264476  ,
         5.5073957 , -4.10698   ,  4.2890778 , -2.8172052 ,  3.6150153 ,
        -4.1613    , -3.6209643 ,  6.2185297 , -6.9460473 , -1.0828307 ,
        -5.82677   ,  2.2258763 , -3.8601217 , -1.6974076 , -5.313418  ,
         7.5795784 ,  2.9187474 ,  8.540423  ,  1.5523206 ,  1.0841804 ],
       [-2.8941853 ,  4.4741907 , -4.4475675 ,  2.3820987 ,  1.7478832 ,
        -2.5046246 , -5.208331  , -1.6937687 , -8.134755  ,  2.6468298 ,
        -4.3163624 ,  5.5655394 , -5.732198  , -1.7384952 , -9.344658  ,
         0.7084658 ,  4.4358397 , -2.9009    , -4.948638  ,  9.695302  ,
         8.366521  , -6.2474537 , -6.3494725 ,  1.9546973 ,  4.157616  ,
        -9.167902  ,  4.6070676 ,  8.788584  ,  6.864423  ,  2.2319884 ],
       [-4.665713  , -9.558958  ,  6.657229  ,  4.440131  ,  2.1730306 ,
         2.5904036 ,  0.58000994,  6.2550354 , -8.829285  , -0.4139966 ,
         9.831051  ,  7.5897346 ,  9.975543  , -5.8561754 , -1.2414306 ,
        -2.5572667 , -1.0441563 , -5.24611   , -9.311467  ,  4.636607  ,
        -0.11776031, -3.929529  ,  6.207367  , -7.399014  ,  5.6740923 ,
        -8.5403    , -7.5186524 , -5.5301213 ,  4.8341303 ,  2.569168  ],
       [-6.9581094 , -9.760796  , -6.55061   , -0.41965044,  6.0687685 ,
         3.7602885 , -3.9751325 ,  6.1493387 , -1.8729935 ,  5.025274  ,
        -6.8340993 ,  1.3383292 ,  9.0016775 , -0.98648345,  9.65402   ,
         9.790737  , -8.618677  ,  5.995579  ,  2.2099135 , -3.6309097 ,
         7.0714087 , -7.394622  , -5.2996335 , -6.9737043 , -7.908465  ,
         6.681064  , -5.575639  ,  7.1313105 ,  6.599619  , -8.309574  ],
       [ 6.2617536 ,  9.228769  ,  8.35813   ,  9.017298  ,  7.704466  ,
        -1.0047106 , -6.2457666 ,  1.3951722 , -6.976181  , -5.9480596 ,
         1.0575897 , -0.0107428 ,  2.8210258 ,  1.8389362 , -8.247101  ,
         3.0498965 , -8.483243  ,  9.72164   , -7.7502713 ,  3.4655957 ,
        -3.9312134 , -4.0965166 ,  2.6586983 ,  1.283246  ,  1.0177817 ,
         5.2571115 , -1.644438  ,  6.1383214 , -6.8840537 , -9.663093  ]],
      dtype=float32)