fuzzy_simplicial_set#
- cuml.manifold.umap.fuzzy_simplicial_set(X, n_neighbors, random_state=None, metric='euclidean', metric_kwds=None, knn_indices=None, knn_dists=None, set_op_mix_ratio=1.0, local_connectivity=1.0, verbose=False)[source]#
Given a set of data X, a neighborhood size, and a measure of distance compute the fuzzy simplicial set (here represented as a fuzzy graph in the form of a sparse matrix) associated to the data. This is done by locally approximating geodesic distance at each point, creating a fuzzy simplicial set for each such point, and then combining all the local fuzzy simplicial sets into a global one via a fuzzy union.
- Parameters:
- X: array of shape (n_samples, n_features)
The data to be modelled as a fuzzy simplicial set.
- n_neighbors: int
The number of neighbors to use to approximate geodesic distance. Larger numbers induce more global estimates of the manifold that can miss finer detail, while smaller values will focus on fine manifold structure to the detriment of the larger picture.
- random_state: numpy RandomState or equivalent
A state capable being used as a numpy random state.
- metric: string (default=’euclidean’).
Distance metric to use. Supported distances are [‘l1, ‘cityblock’, ‘taxicab’, ‘manhattan’, ‘euclidean’, ‘l2’, ‘sqeuclidean’, ‘canberra’, ‘minkowski’, ‘chebyshev’, ‘linf’, ‘cosine’, ‘correlation’, ‘hellinger’, ‘hamming’, ‘jaccard’] Metrics that take arguments (such as minkowski) can have arguments passed via the metric_kwds dictionary. Note: The ‘jaccard’ distance metric is only supported for sparse inputs.
- metric_kwds: dict (optional, default=None)
Metric argument
- knn_indices: array of shape (n_samples, n_neighbors) (optional)
If the k-nearest neighbors of each point has already been calculated you can pass them in here to save computation time. This should be an array with the indices of the k-nearest neighbors as a row for each data point.
- knn_dists: array of shape (n_samples, n_neighbors) (optional)
If the k-nearest neighbors of each point has already been calculated you can pass them in here to save computation time. This should be an array with the distances of the k-nearest neighbors as a row for each data point.
- set_op_mix_ratio: float (optional, default 1.0)
Interpolate between (fuzzy) union and intersection as the set operation used to combine local fuzzy simplicial sets to obtain a global fuzzy simplicial sets. Both fuzzy set operations use the product t-norm. The value of this parameter should be between 0.0 and 1.0; a value of 1.0 will use a pure fuzzy union, while 0.0 will use a pure fuzzy intersection.
- local_connectivity: int (optional, default 1)
The local connectivity required – i.e. the number of nearest neighbors that should be assumed to be connected at a local level. The higher this value the more connected the manifold becomes locally. In practice this should be not more than the local intrinsic dimension of the manifold.
- verbose: bool (optional, default False)
Whether to report information on the current progress of the algorithm.
- Returns:
- fuzzy_simplicial_set: coo_matrix
A fuzzy simplicial set represented as a sparse matrix. The (i, j) entry of the matrix represents the membership strength of the 1-simplex between the ith and jth sample points.