Skip to main content
Ctrl+K

cugraph-docs 25.12.02 documentation

  • Basics
  • nx-cugraph
  • Installation
  • Tutorials
  • Graph Support
    • WholeGraph
    • References
    • Developer Resources
    • API Reference
  • GitHub
  • Twitter
Home
cugraph
cucimcudf-javacudfcugraphcumlcuprojcuspatialcuvscuxfilterdask-cudadask-cudfkvikiolibcudflibcumllibcuprojlibcuspatiallibkvikiolibrapidsmpflibrmmlibucxxnvforestraftrapids-cmakerapidsmpfrmmucxx
legacy (25.12)
nightly (26.04)stable (26.02)legacy (25.12)
  • Basics
  • nx-cugraph
  • Installation
  • Tutorials
  • Graph Support
  • WholeGraph
  • References
  • Developer Resources
  • API Reference
  • GitHub
  • Twitter
  • cugraph.dask.link_prediction.jaccard.jaccard

cugraph.dask.link_prediction.jaccard.jaccard#

cugraph.dask.link_prediction.jaccard.jaccard(input_graph, vertex_pair=None, use_weight=False)[source]#

Compute the Jaccard similarity between each pair of vertices connected by an edge, or between arbitrary pairs of vertices specified by the user. Jaccard similarity is defined between two sets as the ratio of the volume of their intersection over the volume of their union. In the context of graphs, the neighborhood of a vertex is seen as a set. The Jaccard similarity weight of each edge represents the strength of connection between vertices based on the relative similarity of their neighbors.

cugraph.dask.jaccard, in the absence of a specified vertex pair list, will compute the two_hop_neighbors of the entire graph to construct a vertex pair list and will return the jaccard coefficient for those vertex pairs. This is not advisable as the vertex_pairs can grow exponentially with respect to the size of the datasets.

Parameters:
input_graphcugraph.Graph

cuGraph Graph instance, should contain the connectivity information as an edge list (edge weights are not supported yet for this algorithm). The graph should be undirected where an undirected edge is represented by a directed edge in both direction. The adjacency list will be computed if not already present.

This implementation only supports undirected, non-multi Graphs.

vertex_paircudf.DataFrame, optional (default=None)

A GPU dataframe consisting of two columns representing pairs of vertices. If provided, the jaccard coefficient is computed for the given vertex pairs. If the vertex_pair is not provided then the current implementation computes the jaccard coefficient for all vertices that are two hops apart in the graph.

use_weightbool, optional (default=False)

Flag to indicate whether to compute weighted jaccard (if use_weight==True) or un-weighted jaccard (if use_weight==False). ‘input_graph’ must be weighted if ‘use_weight=True’.

Returns:
resultdask_cudf.DataFrame

GPU distributed data frame containing 3 dask_cudf.Series

ddf[‘first’]: dask_cudf.Series

The first vertex ID of each pair (will be identical to first if specified).

ddf[‘second’]: dask_cudf.Series

The second vertex ID of each pair (will be identical to second if specified).

ddf[‘jaccard_coeff’]: dask_cudf.Series

The computed jaccard coefficient between the first and the second vertex ID.

On this page
  • jaccard()

This Page

  • Show Source

© Copyright 2024-2025, NVIDIA Corporation.

Created using Sphinx 8.2.3.

Built with the PyData Sphinx Theme 0.16.1.