cugraph.pagerank#

cugraph.pagerank(G, alpha=0.85, personalization=None, precomputed_vertex_out_weight=None, max_iter=100, tol=1e-05, nstart=None, weight=None, dangling=None, fail_on_nonconvergence=True)[source]#

Find the PageRank score for every vertex in a graph. cuGraph computes an approximation of the Pagerank eigenvector using the power method. The number of iterations depends on the properties of the network itself; it increases when the tolerance descreases and/or alpha increases toward the limiting value of 1. The user is free to use default values or to provide inputs for the initial guess, tolerance and maximum number of iterations. All edges will have an edge_attr value of 1.0 if not provided.

Parameters:
Gcugraph.Graph or networkx.Graph

cuGraph graph descriptor, should contain the connectivity information as an edge list. The transposed adjacency list will be computed if not already present.

alphafloat, optional (default=0.85)

The damping factor alpha represents the probability to follow an outgoing edge, standard value is 0.85. Thus, 1.0-alpha is the probability to “teleport” to a random vertex. Alpha should be greater than 0.0 and strictly lower than 1.0.

personalizationcudf.Dataframe, optional (default=None)

GPU Dataframe containing the personalization information. (a performance optimization)

personalization[‘vertex’]cudf.Series

Subset of vertices of graph for personalization

personalization[‘values’]cudf.Series

Personalization values for vertices

precomputed_vertex_out_weightcudf.Dataframe, optional (default=None)

GPU Dataframe containing the precomputed vertex out weight information(a performance optimization).

precomputed_vertex_out_weight[‘vertex’]cudf.Series

Subset of vertices of graph for precomputed_vertex_out_weight

precomputed_vertex_out_weight[‘sums’]cudf.Series

Corresponding precomputed sum of outgoing vertices weight

max_iterint, optional (default=100)

The maximum number of iterations before an answer is returned. This can be used to limit the execution time and do an early exit before the solver reaches the convergence tolerance. If this value is lower or equal to 0 cuGraph will use the default value, which is 100.

tolfloat, optional (default=1e-05)

Set the tolerance the approximation, this parameter should be a small magnitude value. The lower the tolerance the better the approximation. If this value is 0.0f, cuGraph will use the default value which is 1.0E-5. Setting too small a tolerance can lead to non-convergence due to numerical roundoff. Usually values between 0.01 and 0.00001 are acceptable.

nstartcudf.Dataframe, optional (default=None)

GPU Dataframe containing the initial guess for pagerank. (a performance optimization).

nstart[‘vertex’]cudf.Series

Subset of vertices of graph for initial guess for pagerank values

nstart[‘values’]cudf.Series

Pagerank values for vertices

weight: str, optional (default=None)

The attribute column to be used as edge weights if Graph is a NetworkX Graph. This parameter is here for NetworkX compatibility and is ignored in case of a cugraph.Graph

danglingdict, optional (default=None)

This parameter is here for NetworkX compatibility and ignored

fail_on_nonconvergencebool (default=True)

If the solver does not reach convergence, raise an exception if fail_on_nonconvergence is True. If fail_on_nonconvergence is False, the return value is a tuple of (pagerank, converged) where pagerank is a cudf.DataFrame as described below, and converged is a boolean indicating if the solver converged (True) or not (False).

Returns:
The return value varies based on the value of the fail_on_nonconvergence
paramter. If fail_on_nonconvergence is True:
PageRankcudf.DataFrame

GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding PageRank values.

NOTE: if the input cugraph.Graph was created using the renumber=False option of any of the from_*_edgelist() methods, pagerank assumes that the vertices in the edgelist are contiguous and start from 0. If the actual set of vertices in the edgelist is not contiguous (has gaps) or does not start from zero, pagerank will assume the “missing” vertices are isolated vertices in the graph, and will compute and return pagerank values for each. If this is not the desired behavior, ensure the input cugraph.Graph is created from the from_*_edgelist() functions with the renumber=True option (the default)

df[‘vertex’]cudf.Series

Contains the vertex identifiers

df[‘pagerank’]cudf.Series

Contains the PageRank score

If fail_on_nonconvergence is False:
(PageRank, converged)tuple of (cudf.DataFrame, bool)

PageRank is the GPU dataframe described above, converged is a bool indicating if the solver converged (True) or not (False).

Examples

>>> from cugraph.datasets import karate
>>> G = karate.get_graph(download=True)
>>> pr = cugraph.pagerank(G, alpha = 0.85, max_iter = 500, tol = 1.0e-05)