cugraph.dask.link_analysis.pagerank.pagerank#
- cugraph.dask.link_analysis.pagerank.pagerank(input_graph, alpha=0.85, personalization=None, precomputed_vertex_out_weight=None, max_iter=100, tol=1e-05, nstart=None, fail_on_nonconvergence=True)[source]#
Find the PageRank values for each vertex in a graph using multiple GPUs. cuGraph computes an approximation of the Pagerank using the power method. The input graph must contain edge list as dask-cudf dataframe with one partition per GPU. All edges will have an edge_attr value of 1.0 if not provided.
- Parameters:
- input_graphcugraph.Graph
cuGraph graph descriptor, should contain the connectivity information as dask cudf edge list dataframe(edge weights are not used for this algorithm).
- alphafloat, optional (default=0.85)
The damping factor alpha represents the probability to follow an outgoing edge, standard value is 0.85. Thus, 1.0-alpha is the probability to “teleport” to a random vertex. Alpha should be greater than 0.0 and strictly lower than 1.0.
- personalizationcudf.Dataframe, optional (default=None)
GPU Dataframe containing the personalization information. (a performance optimization)
- personalization[‘vertex’]cudf.Series
Subset of vertices of graph for personalization
- personalization[‘values’]cudf.Series
Personalization values for vertices
- precomputed_vertex_out_weightcudf.Dataframe, optional (default=None)
GPU Dataframe containing the precomputed vertex out weight (a performance optimization) information.
- precomputed_vertex_out_weight[‘vertex’]cudf.Series
Subset of vertices of graph for precomputed_vertex_out_weight
- precomputed_vertex_out_weight[‘sums’]cudf.Series
Corresponding precomputed sum of outgoing vertices weight
- max_iterint, optional (default=100)
The maximum number of iterations before an answer is returned. This can be used to limit the execution time and do an early exit before the solver reaches the convergence tolerance. If this value is lower or equal to 0 cuGraph will use the default value, which is 100.
- tolfloat, optional (default=1e-05)
Set the tolerance the approximation, this parameter should be a small magnitude value. The lower the tolerance the better the approximation. If this value is 0.0f, cuGraph will use the default value which is 1.0E-5. Setting too small a tolerance can lead to non-convergence due to numerical roundoff. Usually values between 0.01 and 0.00001 are acceptable.
- nstartcudf.Dataframe, optional (default=None)
GPU Dataframe containing the initial guess for pagerank. (a performance optimization)
- nstart[‘vertex’]cudf.Series
Subset of vertices of graph for initial guess for pagerank values
- nstart[‘values’]cudf.Series
Pagerank values for vertices
- fail_on_nonconvergencebool (default=True)
If the solver does not reach convergence, raise an exception if fail_on_nonconvergence is True. If fail_on_nonconvergence is False, the return value is a tuple of (pagerank, converged) where pagerank is a cudf.DataFrame as described below, and converged is a boolean indicating if the solver converged (True) or not (False).
- Returns:
- The return value varies based on the value of the fail_on_nonconvergence
- paramter. If fail_on_nonconvergence is True:
- PageRankdask_cudf.DataFrame
GPU data frame containing two dask_cudf.Series of size V: the vertex identifiers and the corresponding PageRank values.
NOTE: if the input cugraph.Graph was created using the renumber=False option of any of the from_*_edgelist() methods, pagerank assumes that the vertices in the edgelist are contiguous and start from 0. If the actual set of vertices in the edgelist is not contiguous (has gaps) or does not start from zero, pagerank will assume the “missing” vertices are isolated vertices in the graph, and will compute and return pagerank values for each. If this is not the desired behavior, ensure the input cugraph.Graph is created from the from_*_edgelist() functions with the renumber=True option (the default)
- ddf[‘vertex’]dask_cudf.Series
Contains the vertex identifiers
- ddf[‘pagerank’]dask_cudf.Series
Contains the PageRank score
- If fail_on_nonconvergence is False:
- (PageRank, converged)tuple of (dask_cudf.DataFrame, bool)
PageRank is the GPU dataframe described above, converged is a bool indicating if the solver converged (True) or not (False).
Examples
>>> import cugraph.dask as dcg >>> import dask_cudf >>> # ... Init a DASK Cluster >>> # see https://docs.rapids.ai/api/cugraph/stable/dask-cugraph.html >>> # Download dataset from https://github.com/rapidsai/cugraph/datasets/.. >>> chunksize = dcg.get_chunksize(datasets_path / "karate.csv") >>> ddf = dask_cudf.read_csv(datasets_path / "karate.csv", ... blocksize=chunksize, delimiter=" ", ... names=["src", "dst", "value"], ... dtype=["int32", "int32", "float32"]) >>> dg = cugraph.Graph(directed=True) >>> dg.from_dask_cudf_edgelist(ddf, source='src', destination='dst') >>> pr = dcg.pagerank(dg)