UMAP#
- class cuml.dask.manifold.UMAP(*, model, client=None, **kwargs)[source]#
Uniform Manifold Approximation and Projection
Finds a low dimensional embedding of the data that approximates an underlying manifold.
Important: This Dask wrapper is designed exclusively for distributed inference; you must first train a
cuml.UMAPmodel on a single GPU and then provide the trained model to this wrapper for distributed transform operations. Distributed training is not supported.- Parameters:
- modelcuml.UMAP, required
A fitted single-GPU UMAP model instance. The model must be trained before passing it to this wrapper.
- clientdask.distributed.Client, optional
Dask client to use
- Adapted from https://github.com/lmcinnes/umap/blob/master/umap/umap_.py
Methods
transform(X[, convert_dtype])Transform X into the existing embedded space and return that transformed output.
Notes
The single-GPU
cuml.UMAPmodule is heavily based on Leland McInnes’ reference UMAP package [1].References
Examples
>>> from dask_cuda import LocalCUDACluster >>> from dask.distributed import Client >>> import dask.array as da >>> from cuml.datasets import make_blobs >>> from cuml.manifold import UMAP >>> from cuml.dask.manifold import UMAP as MNMG_UMAP >>> import numpy as np >>> cluster = LocalCUDACluster(threads_per_worker=1) >>> client = Client(cluster) >>> X, y = make_blobs(1000, 10, centers=42, cluster_std=0.1, ... dtype=np.float32, random_state=10) >>> local_model = UMAP(random_state=10, verbose=0) >>> selection = np.random.RandomState(10).choice(1000, 100) >>> X_train = X[selection] >>> y_train = y[selection] >>> local_model.fit(X_train, y=y_train) UMAP() >>> distributed_model = MNMG_UMAP(model=local_model) >>> distributed_X = da.from_array(X, chunks=(500, -1)) >>> embedding = distributed_model.transform(distributed_X) >>> result = embedding.compute() >>> print(result) [[ 4.1684933 4.1890593 ] [ 5.0110254 -5.2143383 ] [ 1.7776365 -17.665699 ] ... [ -6.6378727 -0.15353012] [ -3.1891193 -0.83906937] [ -0.5042019 2.1454725 ]] >>> client.close() >>> cluster.close()
- transform(X, convert_dtype=True)[source]#
Transform X into the existing embedded space and return that transformed output.
Please refer to the reference UMAP implementation for information on the differences between fit_transform() and running fit() transform().
Specifically, the transform() function is stochastic: https://github.com/lmcinnes/umap/issues/158
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
New data to be transformed. Acceptable formats: dask cuDF, dask CuPy/NumPy/Numba Array
- Returns:
- X_newarray, shape (n_samples, n_components)
Embedding of the new data in low-dimensional space.