Public Attributes | List of all members
ML::graph_build_params::graph_build_params Struct Reference

#include <umapparams.h>

Collaboration diagram for ML::graph_build_params::graph_build_params:
Collaboration graph

Public Attributes

size_t overlap_factor = 2
 
size_t n_clusters = 1
 
nn_descent_params_umap nn_descent_params
 

Detailed Description

Parameters for knn graph building in UMAP. [Hint1]: the ratio of overlap_factor / n_clusters determines device memory usage. Approximately (overlap_factor / n_clusters) * num_rows_in_entire_data number of rows will be put on device memory at once. E.g. between (overlap_factor / n_clusters) = 2/10 and 2/20, the latter will use less device memory. [Hint2]: larger overlap_factor results in better accuracy of the final all-neighbors knn graph. E.g. While using similar amount of device memory, (overlap_factor / n_clusters) = 4/20 will have better accuracy than 2/10 at the cost of performance. [Hint3]: for overlap_factor, start with 2, and gradually increase (2->3->4 ...) for better accuracy [Hint4]: for n_clusters, start with 4, and gradually increase(4->8->16 ...) for less GPU memory usage. This is independent from overlap_factor as long as overlap_factor < n_clusters

Member Data Documentation

◆ n_clusters

size_t ML::graph_build_params::graph_build_params::n_clusters = 1

Number of clusters to split the data into when building the knn graph. Increasing this will use less device memory at the cost of accuracy. When using n_clusters > 1, is is required that the data is put on host (refer to data_on_host argument for fit_transform). The default value (n_clusters=1) will place the entire data on device memory.

◆ nn_descent_params

nn_descent_params_umap ML::graph_build_params::graph_build_params::nn_descent_params

◆ overlap_factor

size_t ML::graph_build_params::graph_build_params::overlap_factor = 2

Number of clusters each data point is assigned to. Only valid when n_clusters > 1.


The documentation for this struct was generated from the following file: