cugraph.dask.centrality.betweenness_centrality.betweenness_centrality#

cugraph.dask.centrality.betweenness_centrality.betweenness_centrality(input_graph, k: int | list | Series | DataFrame | Series | DataFrame = None, normalized: bool = True, weight: DataFrame = None, endpoints: bool = False, random_state: int = None) DataFrame[source]#

Compute the betweenness centrality for all vertices of the graph G. Betweenness centrality is a measure of the number of shortest paths that pass through a vertex. A vertex with a high betweenness centrality score has more paths passing through it and is therefore believed to be more important.

To improve performance. rather than doing an all-pair shortest path, a sample of k starting vertices can be used.

CuGraph does not currently support ‘weight’ parameters.

Parameters:
input_graph: cuGraph.Graph

The graph can be either directed (Graph(directed=True)) or undirected. The current implementation uses a parallel variation of the Brandes Algorithm (2001) to compute exact or approximate betweenness. If weights are provided in the edgelist, they will not be used.

kint, list or (dask)cudf object or None, optional (default=None)

If k is not None, use k node samples to estimate betweenness. Higher values give better approximation. If k is either a list, a cudf DataFrame, or a dask_cudf DataFrame, then its contents are assumed to be vertex identifiers to be used for estimation. If k is None (the default), all the vertices are used to estimate betweenness. Vertices obtained through sampling or defined as a list will be used as sources for traversals inside the algorithm.

normalizedbool, optional (default=True)

If True, normalize the resulting betweenness centrality values by __2 / ((n - 1) * (n - 2))__ for undirected Graphs, and __1 / ((n - 1) * (n - 2))__ for directed Graphs where n is the number of nodes in G. Normalization will ensure that values are in [0, 1], this normalization scales for the highest possible value where one node is crossed by every single shortest path.

weight(dask)cudf.DataFrame, optional (default=None)

Specifies the weights to be used for each edge. Should contain a mapping between edges and weights. (Not Supported)

endpointsbool, optional (default=False)

If true, include the endpoints in the shortest path counts.

random_stateint, optional (default=None)

if k is specified and k is an integer, use random_state to initialize the random number generator. Using None defaults to a hash of process id, time, and hostname If k is either None or list or cudf objects: random_state parameter is ignored.

Returns:
betweenness_centralitydask_cudf.DataFrame

GPU distributed data frame containing two dask_cudf.Series of size V: the vertex identifiers and the corresponding betweenness centrality values.

ddf[‘vertex’]dask_cudf.Series

Contains the vertex identifiers

ddf[‘betweenness_centrality’]dask_cudf.Series

Contains the betweenness centrality of vertices

Examples

>>> import cugraph.dask as dcg
>>> import dask_cudf
>>> # ... Init a DASK Cluster
>>> #    see https://docs.rapids.ai/api/cugraph/stable/dask-cugraph.html
>>> # Download dataset from https://github.com/rapidsai/cugraph/datasets/..
>>> chunksize = dcg.get_chunksize(datasets_path / "karate.csv")
>>> ddf = dask_cudf.read_csv(datasets_path / "karate.csv",
...                          blocksize=chunksize, delimiter=" ",
...                          names=["src", "dst", "value"],
...                          dtype=["int32", "int32", "float32"])
>>> dg = cugraph.Graph(directed=True)
>>> dg.from_dask_cudf_edgelist(ddf, source='src', destination='dst')
>>> pr = dcg.betweenness_centrality(dg)