simplicial_set_embedding#
- cuml.manifold.umap.simplicial_set_embedding(data, graph, n_components=2, initial_alpha=1.0, a=None, b=None, gamma=1.0, negative_sample_rate=5, n_epochs=None, init='spectral', random_state=None, force_serial_epochs=False, metric='euclidean', metric_kwds=None, output_metric='euclidean', output_metric_kwds=None, convert_dtype=True, verbose=False)[source]#
Perform a fuzzy simplicial set embedding, using a specified initialisation method and then minimizing the fuzzy set cross entropy between the 1-skeletons of the high and low dimensional fuzzy simplicial sets.
- Parameters:
- data: array of shape (n_samples, n_features)
The source data to be embedded by UMAP.
- graph: sparse matrix
The 1-skeleton of the high dimensional fuzzy simplicial set as represented by a graph for which we require a sparse matrix for the (weighted) adjacency matrix.
- n_components: int
The dimensionality of the euclidean space into which to embed the data.
- initial_alpha: float
Initial learning rate for the SGD.
- a: float
Parameter of differentiable approximation of right adjoint functor
- b: float
Parameter of differentiable approximation of right adjoint functor
- gamma: float
Weight to apply to negative samples.
- negative_sample_rate: int (optional, default 5)
The number of negative samples to select per positive sample in the optimization process. Increasing this value will result in greater repulsive force being applied, greater optimization cost, but slightly more accuracy.
- n_epochs: int (optional, default 0)
The number of training epochs to be used in optimizing the low dimensional embedding. Larger values result in more accurate embeddings. If 0 is specified a value will be selected based on the size of the input dataset (200 for large datasets, 500 for small).
- init: string
- How to initialize the low dimensional embedding. Options are:
‘spectral’: use a spectral embedding of the fuzzy 1-skeleton
‘random’: assign initial embedding positions at random.
An array-like with initial embedding positions.
- random_state: numpy RandomState or equivalent
A state capable being used as a numpy random state.
- force_serial_epochs: bool, optional (default=False)
If
True, optimization epochs will be executed with reduced GPU parallelism. This is only relevant whenrandom_stateis set. Enable this if you observe outliers in the resulting embeddings withrandom_stateconfigured. This may slow the optimization step by more than 2x, but end-to-end runtime is typically similar since optimization step is not the bottleneck. Use this to resolve rare edge cases where the default heuristics do not trigger.- metric: string (default=’euclidean’).
Distance metric to use. Supported distances are [‘l1, ‘cityblock’, ‘taxicab’, ‘manhattan’, ‘euclidean’, ‘l2’, ‘sqeuclidean’, ‘canberra’, ‘minkowski’, ‘chebyshev’, ‘linf’, ‘cosine’, ‘correlation’, ‘hellinger’, ‘hamming’, ‘jaccard’] Metrics that take arguments (such as minkowski) can have arguments passed via the metric_kwds dictionary. Note: The ‘jaccard’ distance metric is only supported for sparse inputs.
- metric_kwds: dict (optional, default=None)
Metric argument
- output_metric: function
Function returning the distance between two points in embedding space and the gradient of the distance wrt the first argument.
- output_metric_kwds: dict
Key word arguments to be passed to the output_metric function.
- verbose: bool (optional, default False)
Whether to report information on the current progress of the algorithm.
- Returns:
- embedding: array of shape (n_samples, n_components)
The optimized of
graphinto ann_componentsdimensional euclidean space.