NetworkX by calling cuGraph Algorithms#
Note: this is a work in progress and will be updatred and changed as we better flesh out compatibility issues
Latest Update#
Last Update: March 7th, 2024 Release: 24.04
**CuGraph is now a registered backend for networkX. This is described in the following blog: Accelerating NetworkX on NVIDIA GPUs for High Performance Graph Analytics
Easy Path – Use NetworkX Graph Objects, Accelerated Algorithms#
Rather than updating all of your existing code, simply update the calls to graph algorithms by replacing the module name. This allows all the complicated ETL code to be unchanged while still seeing significate performance improvements. Again this will be deprecated since networkX dispatching to nx_cugraph has many advantages.
It is that easy. All algorithms in cuGraph support a NetworkX graph object as input and match the NetworkX API list of arguments.
Currently, cuGraph accepts both NetworkX Graph and DiGraph objects. We will be adding support for Bipartite graph and Multigraph over the next few releases.
Differences in Algorithms#
Since cuGraph currently does not support attribute rich graphs, those algorithms that return simple scores (centrality, clustering, etc.) best match the NetworkX process. Algorithms that return a subgraph will do so without any additional attributes on the nodes or edges.
Algorithms that exactly match#
Algorithm |
Differences |
---|---|
Core Number |
None |
HITS |
None |
PageRank |
None |
Personal PageRank |
None |
Strongly Connected Components |
None |
Weakly Connected Components |
None |
Algorithms that do not copy over additional attributes#
Algorithm |
Differences |
---|---|
K-Truss |
Does not copy over attributes |
K-Core |
Does not copy over attributes |
Subgraph Extraction |
Does not copy over attributes |
Algorithms not in NetworkX#
Algorithm |
Differences |
---|---|
Ensemble Clustering for Graphs (ECG) |
Currently not in NetworkX |
Force Atlas 2 |
Currently not in NetworkX |
Leiden |
Currently not in NetworkX |
Louvain |
Currently not in NetworkX |
Overlap coefficient |
Currently not in NetworkX |
Spectral Clustering |
Currently not in NetworkX |
Algorithm where not all arguments are supported#
Algorithm |
Differences |
---|---|
Betweenness Centrality |
weight is currently not supported – ignored endpoints is currently not supported – ignored |
Edge Betweenness Centrality |
weight is currently not supported – ignored |
Katz Centrality |
beta is currently not supported – ignored max_iter defaults to 100 versus 1000 |
Algorithms where the results are different#
For example, the NetworkX traversal algorithms typically return a generator rather than a dictionary.
Algorithm |
Differences |
---|---|
Triangle Counting |
this algorithm simply returns the total number of triangle and not the number per vertex (on roadmap to update) |
Jaccard coefficient |
Currently we only do a 1-hop computation rather than an all-pairs. Fix is on roadmap |
Breadth First Search (BFS) |
Returns a Pandas DataFrame with: [vertex][distance][predecessor] |
Single Source Shortest Path (SSSP) |
Returns a Pandas DataFrame with: [vertex][distance][predecessor] |
Graph Building#
The biggest difference between NetworkX and cuGraph is with how Graph objects are built. NetworkX, for the most part, stores graph data in a dictionary. That structure allows easy insertion of new records. Consider the following code for building a NetworkX Graph:
# Read the node data
df = pd.read_csv( data_file)
# Construct graph from edge list.
G = nx.DiGraph()
for row in df.iterrows():
G.add_edge(
row[1]["1"], row[1]["2"], count=row[1]["3"]
)
The code block is perfectly fine for NetworkX. However, the process of iterating over the dataframe and adding one node at a time is problematic for GPUs and something that we try and avoid. cuGraph stores data in columns (i.e. arrays). Resizing an array requires allocating a new array one element larger, copying the data, and adding the new value. That is not very efficient.
If your code follows the above model of inserting one element at a time, the we suggest either rewriting that code or using it as is within NetworkX and just accelerating the algorithms with cuGraph.
Now, if your code bulk loads the data from Pandas, then RAPIDS can accelerate that process by orders of magnitude.
The above cuGraph code will create cuGraph.Graph object and not a NetworkX.Graph object.