simplicial_set_embedding#
- cuml.manifold.umap.simplicial_set_embedding(data, graph, n_components=2, initial_alpha=1.0, a=None, b=None, gamma=1.0, negative_sample_rate=5, n_epochs=None, init='spectral', random_state=None, force_serial_epochs=None, metric='euclidean', metric_kwds=None, output_metric='euclidean', output_metric_kwds=None, convert_dtype=True, verbose=False)[source]#
Perform a fuzzy simplicial set embedding, using a specified initialisation method and then minimizing the fuzzy set cross entropy between the 1-skeletons of the high and low dimensional fuzzy simplicial sets.
- Parameters:
- data: array of shape (n_samples, n_features)
The source data to be embedded by UMAP.
- graph: sparse matrix
The 1-skeleton of the high dimensional fuzzy simplicial set as represented by a graph for which we require a sparse matrix for the (weighted) adjacency matrix.
Note: When
force_serial_epochsis enabled (either explicitly or via the auto-default forinit='spectral'withn_components <= 512), the COO is required to be sorted by row for internal CSR conversion. If it is not, it will be sorted internally. To avoid the extra sort, pass a row-sorted COO.- n_components: int
The dimensionality of the euclidean space into which to embed the data.
- initial_alpha: float
Initial learning rate for the SGD.
- a: float
Parameter of differentiable approximation of right adjoint functor
- b: float
Parameter of differentiable approximation of right adjoint functor
- gamma: float
Weight to apply to negative samples.
- negative_sample_rate: int (optional, default 5)
The number of negative samples to select per positive sample in the optimization process. Increasing this value will result in greater repulsive force being applied, greater optimization cost, but slightly more accuracy.
- n_epochs: int (optional, default 0)
The number of training epochs to be used in optimizing the low dimensional embedding. Larger values result in more accurate embeddings. If 0 is specified a value will be selected based on the size of the input dataset (200 for large datasets, 500 for small).
- init: string
- How to initialize the low dimensional embedding. Options are:
‘spectral’: use a spectral embedding of the fuzzy 1-skeleton
‘random’: assign initial embedding positions at random.
An array-like with initial embedding positions.
Note: When
init='spectral'andn_components <= 512,force_serial_epochsdefaults toTruebecause spectral initialization is more susceptible to outlier artifacts. Passforce_serial_epochs=Falseexplicitly to disable and use the faster parallel batch kernel.- random_state: numpy RandomState or equivalent
A state capable being used as a numpy random state.
- force_serial_epochs: bool or None, optional (default=None)
Controls whether optimization epochs use the sequential (reduced GPU parallelism) kernel. When
None(the default), serial epochs are enabled automatically forinit='spectral'withn_components <= 512because spectral initialization is more susceptible to outlier artifacts; forn_components > 512the auto-default falls back toFalsesince the serial kernel does not support that range. PassTrueto force serial epochs regardless of init (only supported forn_components <= 512; otherwise aValueErroris raised), orFalseto disable them.- metric: string (default=’euclidean’).
Distance metric to use. Supported distances are [‘l1, ‘cityblock’, ‘taxicab’, ‘manhattan’, ‘euclidean’, ‘l2’, ‘sqeuclidean’, ‘canberra’, ‘minkowski’, ‘chebyshev’, ‘linf’, ‘cosine’, ‘correlation’, ‘hellinger’, ‘hamming’, ‘jaccard’] Metrics that take arguments (such as minkowski) can have arguments passed via the metric_kwds dictionary. Note: The ‘jaccard’ distance metric is only supported for sparse inputs.
- metric_kwds: dict (optional, default=None)
Metric argument
- output_metric: function
Function returning the distance between two points in embedding space and the gradient of the distance wrt the first argument.
- output_metric_kwds: dict
Key word arguments to be passed to the output_metric function.
- verbose: bool (optional, default False)
Whether to report information on the current progress of the algorithm.
- Returns:
- embedding: array of shape (n_samples, n_components)
The optimized of
graphinto ann_componentsdimensional euclidean space.