cugraph.overlap#

cugraph.overlap(input_graph: Graph, vertex_pair: Optional[DataFrame] = None, do_expensive_check: bool = False, use_weight: bool = False)[source]#

Compute the Overlap Coefficient between each pair of vertices connected by an edge, or between arbitrary pairs of vertices specified by the user. Overlap Coefficient is defined between two sets as the ratio of the volume of their intersection divided by the smaller of their two volumes. In the context of graphs, the neighborhood of a vertex is seen as a set. The Overlap Coefficient weight of each edge represents the strength of connection between vertices based on the relative similarity of their neighbors. If first is specified but second is not, or vice versa, an exception will be thrown.

cugraph.overlap, in the absence of a specified vertex pair list, will compute the two_hop_neighbors of the entire graph to construct a vertex pair list and will return the overlap coefficient for those vertex pairs. This is not advisable as the vertex_pairs can grow exponentially with respect to the size of the datasets

Parameters:
input_graphcugraph.Graph

cuGraph Graph instance, should contain the connectivity information as an edge list. The adjacency list will be computed if not already present.

This implementation only supports undirected, non-multi edge Graph.

vertex_paircudf.DataFrame, optional (default=None)

A GPU dataframe consisting of two columns representing pairs of vertices. If provided, the overlap coefficient is computed for the given vertex pairs, else, it is computed for all vertex pairs.

do_expensive_checkbool, optional (default=False)

Deprecated. This option added a check to ensure integer vertex IDs are sequential values from 0 to V-1. That check is now redundant because cugraph unconditionally renumbers and un-renumbers integer vertex IDs for optimal performance, therefore this option is deprecated and will be removed in a future version.

use_weightbool, optional (default=False)

Flag to indicate whether to compute weighted overlap (if use_weight==True) or un-weighted overlap (if use_weight==False). ‘input_graph’ must be weighted if ‘use_weight=True’.

Returns:
dfcudf.DataFrame

GPU data frame of size E (the default) or the size of the given pairs (first, second) containing the Overlap coefficients. The ordering is relative to the adjacency list, or that given by the specified vertex pairs.

df[‘first’]cudf.Series

The first vertex ID of each pair (will be identical to first if specified).

df[‘second’]cudf.Series

The second vertex ID of each pair (will be identical to second if specified).

df[‘overlap_coeff’]cudf.Series

The computed overlap coefficient between the first and the second vertex ID.

Examples

>>> from cugraph.datasets import karate
>>> from cugraph import overlap
>>> input_graph = karate.get_graph(download=True, ignore_weights=True)
>>> df = overlap(input_graph)