cugraph.dask.link_analysis.hits.hits#
- cugraph.dask.link_analysis.hits.hits(input_graph, tol=1e-05, max_iter=100, nstart=None, normalized=True)[source]#
Compute HITS hubs and authorities values for each vertex
The HITS algorithm computes two numbers for a node. Authorities estimates the node value based on the incoming links. Hubs estimates the node value based on outgoing links.
Both cuGraph and networkx implementation use a 1-norm.
- Parameters:
- input_graphcugraph.Graph
cuGraph graph descriptor, should contain the connectivity information as an edge list (edge weights are not used for this algorithm). The adjacency list will be computed if not already present.
- tolfloat, optional (default=1.0e-5)
Set the tolerance of the approximation, this parameter should be a small magnitude value.
- max_iterint, optional (default=100)
The maximum number of iterations before an answer is returned.
- nstartcudf.Dataframe, optional (default=None)
The initial hubs guess vertices along with their initial hubs guess value
- nstart[‘vertex’]cudf.Series
Initial hubs guess vertices
- nstart[‘values’]cudf.Series
Initial hubs guess values
- normalizedbool, optional (default=True)
A flag to normalize the results
- Returns:
- HubsAndAuthoritiesdask_cudf.DataFrame
GPU distributed data frame containing three dask_cudf.Series of size V: the vertex identifiers and the corresponding hubs and authorities values.
- df[‘vertex’]dask_cudf.Series
Contains the vertex identifiers
- df[‘hubs’]dask_cudf.Series
Contains the hubs score
- df[‘authorities’]dask_cudf.Series
Contains the authorities score
Examples
>>> import cugraph.dask as dcg >>> import dask_cudf >>> # ... Init a DASK Cluster >>> # see https://docs.rapids.ai/api/cugraph/stable/dask-cugraph.html >>> # Download dataset from https://github.com/rapidsai/cugraph/datasets/.. >>> chunksize = dcg.get_chunksize(datasets_path / "karate.csv") >>> ddf = dask_cudf.read_csv(datasets_path / "karate.csv", ... blocksize=chunksize, delimiter=" ", ... names=["src", "dst", "value"], ... dtype=["int32", "int32", "float32"]) >>> dg = cugraph.Graph(directed=True) >>> dg.from_dask_cudf_edgelist(ddf, source='src', destination='dst', ... edge_attr='value') >>> hits = dcg.hits(dg, max_iter = 50)