#include <umapparams.h>
Public Attributes | |
size_t | overlap_factor = 2 |
size_t | n_clusters = 1 |
nn_descent_params_umap | nn_descent_params |
Parameters for knn graph building in UMAP. [Hint1]: the ratio of overlap_factor / n_clusters determines device memory usage. Approximately (overlap_factor / n_clusters) * num_rows_in_entire_data number of rows will be put on device memory at once. E.g. between (overlap_factor / n_clusters) = 2/10 and 2/20, the latter will use less device memory. [Hint2]: larger overlap_factor results in better accuracy of the final all-neighbors knn graph. E.g. While using similar amount of device memory, (overlap_factor / n_clusters) = 4/20 will have better accuracy than 2/10 at the cost of performance. [Hint3]: for overlap_factor, start with 2, and gradually increase (2->3->4 ...) for better accuracy [Hint4]: for n_clusters, start with 4, and gradually increase(4->8->16 ...) for less GPU memory usage. This is independent from overlap_factor as long as overlap_factor < n_clusters
size_t ML::graph_build_params::graph_build_params::n_clusters = 1 |
Number of clusters to split the data into when building the knn graph. Increasing this will use less device memory at the cost of accuracy. When using n_clusters > 1, is is required that the data is put on host (refer to data_on_host argument for fit_transform). The default value (n_clusters=1) will place the entire data on device memory.
nn_descent_params_umap ML::graph_build_params::graph_build_params::nn_descent_params |
size_t ML::graph_build_params::graph_build_params::overlap_factor = 2 |
Number of clusters each data point is assigned to. Only valid when n_clusters > 1.