cugraph.dask.community.leiden.leiden#

cugraph.dask.community.leiden.leiden(input_graph: Graph, max_iter: int = 100, resolution: int = 1.0, random_state: int = None, theta: int = 1.0) Tuple[dask_cudf.DataFrame, float][source]#

Compute the modularity optimizing partition of the input graph using the Leiden method

Traag, V. A., Waltman, L., & van Eck, N. J. (2019). From Louvain to Leiden: guaranteeing well-connected communities. Scientific reports, 9(1), 5233. doi: 10.1038/s41598-019-41695-z

Parameters:
Gcugraph.Graph

The graph descriptor should contain the connectivity information and weights. The adjacency list will be computed if not already present. The current implementation only supports undirected graphs.

max_iterinteger, optional (default=100)

This controls the maximum number of levels/iterations of the Leiden algorithm. When specified the algorithm will terminate after no more than the specified number of iterations. No error occurs when the algorithm terminates early in this manner.

resolution: float, optional (default=1.0)

Called gamma in the modularity formula, this changes the size of the communities. Higher resolutions lead to more smaller communities, lower resolutions lead to fewer larger communities. Defaults to 1.

random_state: int, optional(default=None)

Random state to use when generating samples. Optional argument, defaults to a hash of process id, time, and hostname.

theta: float, optional (default=1.0)

Called theta in the Leiden algorithm, this is used to scale modularity gain in Leiden refinement phase, to compute the probability of joining a random leiden community.

Returns:
partsdask_cudf.DataFrame

GPU data frame of size V containing two columns the vertex id and the partition id it is assigned to.

ddf[‘vertex’]cudf.Series

Contains the vertex identifiers

ddf[‘partition’]cudf.Series

Contains the partition assigned to the vertices

modularity_scorefloat

a floating point number containing the global modularity score of the partitioning.

Examples

>>> import cugraph.dask as dcg
>>> import dask_cudf
>>> # ... Init a DASK Cluster
>>> #    see https://docs.rapids.ai/api/cugraph/stable/dask-cugraph.html
>>> # Download dataset from https://github.com/rapidsai/cugraph/datasets/..
>>> chunksize = dcg.get_chunksize(datasets_path / "karate.csv")
>>> ddf = dask_cudf.read_csv(datasets_path / "karate.csv",
...                          blocksize=chunksize, delimiter=" ",
...                          names=["src", "dst", "value"],
...                          dtype=["int32", "int32", "float32"])
>>> dg = cugraph.Graph()
>>> dg.from_dask_cudf_edgelist(ddf, source='src', destination='dst')
>>> parts, modularity_score = dcg.leiden(dg)