cuGraph API Reference¶
Structure¶
Graph¶

class
cugraph.structure.graph.
Graph
(m_graph=None, edge_attr=None, symmetrized=False, bipartite=False, multi=False, dynamic=False)¶ Methods
add_internal_vertex_id
(df, …[, drop, …])Given a DataFrame containing external vertex ids in the identified columns, return a DataFrame containing the internal vertex ids as the specified column name.
add_nodes_from
(nodes[, bipartite, multipartite])Add nodes information to the Graph.
clear
()Empty this graph.
compute_local_data
(by[, load_balance])Compute the local edges, vertices and offsets for a distributed graph stored as a daskcudf dataframe and initialize the communicator.
degree
([vertex_subset])Compute vertex degree.
degrees
([vertex_subset])Compute vertex indegree and outdegree.
Delete the adjacency list.
Delete the edge list.
edges
()Returns all the edges in the graph as a cudf.DataFrame containing sources and destinations.
from_cudf_adjlist
(offset_col, index_col[, …])Initialize a graph from the adjacency list.
from_cudf_edgelist
(input_df[, source, …])Initialize a graph from the edge list.
from_dask_cudf_edgelist
(input_ddf[, source, …])Initializes the distributed graph from the dask_cudf.DataFrame edgelist.
Compute vertex pairs that are two hops apart.
has_edge
(u, v)Returns True if the graph contains the edge (u,v).
has_node
(n)Returns True if the graph contains the node n.
in_degree
([vertex_subset])Compute vertex indegree.
Checks if Graph is bipartite.
Checks if Graph is multipartite.
lookup_internal_vertex_id
(df[, column_name])Given a DataFrame containing external vertex ids in the identified columns, or a Series containing external vertex ids, return a Series with the internal vertex ids.
nodes
()Returns all the nodes in the graph as a cudf.Series
number_of_edges
([directed_edges])Get the number of edges in the graph.
An alias of number_of_vertices().
Get the number of nodes in the graph.
out_degree
([vertex_subset])Compute vertex outdegree.
sets
()Returns the bipartite set of nodes.
Return a directed representation of the graph.
Return an undirected copy of the graph.
unrenumber
(df, column_name[, preserve_order])Given a DataFrame containing internal vertex ids in the identified column, replace this with external vertex ids.
Display the adjacency list.
Display the edge list.
Display the transposed adjacency list.
AdjList
EdgeList
add_adj_list
add_edge_list
enable_batch
is_directed
neighbors
transposedAdjList

class
AdjList
(offsets, indices, value=None)¶

class
EdgeList
(*args)¶

add_adj_list
(offset_col, index_col, value_col=None)¶

add_edge_list
(source, destination, value=None)¶

add_internal_vertex_id
(df, external_column_name, internal_column_name, drop=True, preserve_order=False)¶ Given a DataFrame containing external vertex ids in the identified columns, return a DataFrame containing the internal vertex ids as the specified column name. Optionally drop the external vertex id columns. Optionally preserve the order of the original DataFrame.
 Parameters
 df: cudf.DataFrame or dask_cudf.DataFrame
A DataFrame containing external vertex identifiers that will be converted into internal vertex identifiers.
 external_column_name: string or list of strings
Name of the column(s) containing the external vertex ids
 internal_column_name: string
Name of column to contain the internal vertex id
 drop: (optional) bool, defaults to True
Drop the external columns from the returned DataFrame
 preserve_order: (optional) bool, defaults to False
Preserve the order of the data frame (requires an extra sort)
 Returns
 dfcudf.DataFrame or dask_cudf.DataFrame
Original DataFrame with new column containing internal vertex id

add_nodes_from
(nodes, bipartite=None, multipartite=None)¶ Add nodes information to the Graph.
 Parameters
 nodeslist or cudf.Series
The nodes of the graph to be stored. If bipartite and multipartite arguments are not passed, the nodes are considered to be a list of all the nodes present in the Graph.
 bipartitestr
Sets the Graph as bipartite. The nodes are stored as a set of nodes of the partition named as bipartite argument.
 multipartitestr
Sets the Graph as multipartite. The nodes are stored as a set of nodes of the partition named as multipartite argument.

clear
()¶ Empty this graph. This function is added for NetworkX compatibility.

compute_local_data
(by, load_balance=True)¶ Compute the local edges, vertices and offsets for a distributed graph stored as a daskcudf dataframe and initialize the communicator. Performs global sorting and load_balancing.
 Parameters
 bystr
by argument is the column by which we want to sort and partition. It should be the source column name for generating CSR format and destination column name for generating CSC format.
 load_balancebool
Set as True to perform load_balancing after global sorting of daskcudf DataFrame. This ensures that the data is uniformly distributed among multiple GPUs to avoid overloading.

degree
(vertex_subset=None)¶ Compute vertex degree. By default, this method computes vertex degrees for the entire set of vertices. If vertex_subset is provided, this method optionally filters out all but those listed in vertex_subset.
 Parameters
 vertex_subsetcudf.Series or iterable container, optional
A container of vertices for displaying corresponding degree. If not set, degrees are computed for the entire set of vertices.
 Returns
 dfcudf.DataFrame
GPU DataFrame of size N (the default) or the size of the given vertices (vertex_subset) containing the degree. The ordering is relative to the adjacency list, or that given by the specified vertex_subset. df[‘vertex’] : cudf.Series
The vertex IDs (will be identical to vertex_subset if specified).
 df[‘degree’]cudf.Series
The computed degree of the corresponding vertex.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> df = G.degree([0,9,12])

degrees
(vertex_subset=None)¶ Compute vertex indegree and outdegree. By default, this method computes vertex degrees for the entire set of vertices. If vertex_subset is provided, this method optionally filters out all but those listed in vertex_subset.
 Parameters
 vertex_subsetcudf.Series or iterable container, optional
A container of vertices for displaying corresponding degree. If not set, degrees are computed for the entire set of vertices.
 Returns
 dfcudf.DataFrame
 df[‘vertex’]cudf.Series
The vertex IDs (will be identical to vertex_subset if specified).
 df[‘in_degree’]cudf.Series
The indegree of the vertex.
 df[‘out_degree’]cudf.Series
The outdegree of the vertex.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> df = G.degrees([0,9,12])

delete_adj_list
()¶ Delete the adjacency list.

delete_edge_list
()¶ Delete the edge list.

edges
()¶ Returns all the edges in the graph as a cudf.DataFrame containing sources and destinations. It does not return the edge weights. For viewing edges with weights use view_edge_list()

enable_batch
()¶

from_cudf_adjlist
(offset_col, index_col, value_col=None)¶ Initialize a graph from the adjacency list. It is an error to call this method on an initialized Graph object. The passed offset_col and index_col arguments wrap gdf_column objects that represent a graph using the adjacency list format. If value_col is None, an unweighted graph is created. If value_col is not None, a weighted graph is created. If copy is False, this function stores references to the passed objects pointed by offset_col and index_col. If copy is True, this funcion stores references to the deepcopies of the passed objects pointed by offset_col and index_col. Undirected edges must be stored as directed edges in both directions.
 Parameters
 offset_colcudf.Series
This cudf.Series wraps a gdf_column of size V + 1 (V: number of vertices). The gdf column contains the offsets for the vertices in this graph. Offsets must be in the range [0, E] (E: number of edges).
 index_colcudf.Series
This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the destination index for each edge. Destination indices must be in the range [0, V) (V: number of vertices).
 value_colcudf.Series, optional
This pointer can be
None
. If not, this cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the weight value for each edge. The expected type of the gdf_column element is floating point number.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> M = M.to_pandas() >>> M = scipy.sparse.coo_matrix((M['2'],(M['0'],M['1']))) >>> M = M.tocsr() >>> offsets = cudf.Series(M.indptr) >>> indices = cudf.Series(M.indices) >>> G = cugraph.Graph() >>> G.from_cudf_adjlist(offsets, indices, None)

from_cudf_edgelist
(input_df, source='source', destination='destination', edge_attr=None, renumber=True)¶ Initialize a graph from the edge list. It is an error to call this method on an initialized Graph object. The passed input_df argument wraps gdf_column objects that represent a graph using the edge list format. source argument is source column name and destination argument is destination column name.
By default, renumbering is enabled to map the source and destination vertices into an index in the range [0, V) where V is the number of vertices. If the input vertices are a single column of integers in the range [0, V), renumbering can be disabled and the original external vertex ids will be used.
If weights are present, edge_attr argument is the weights column name.
 Parameters
 input_dfcudf.DataFrame or dask_cudf.DataFrame
This cudf.DataFrame wraps source, destination and weight gdf_column of size E (E: number of edges) The ‘src’ column contains the source index for each edge. Source indices are in the range [0, V) (V: number of vertices). The ‘dst’ column contains the destination index for each edge. Destination indices are in the range [0, V) (V: number of vertices). If renumbering needs to be done, renumber argument should be passed as True. For weighted graphs, dataframe contains ‘weight’ column containing the weight value for each edge. If a dask_cudf.DataFrame is passed it will be reinterpreted as a cudf.DataFrame. For the distributed path please use from_dask_cudf_edgelist.
 sourcestr
source argument is source column name
 destinationstr
destination argument is destination column name.
 edge_attrstr
edge_attr argument is the weights column name.
 renumberbool
If source and destination indices are not in range 0 to V where V is number of vertices, renumber argument should be True.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr='2', renumber=False)

from_dask_cudf_edgelist
(input_ddf, source='source', destination='destination', edge_attr=None, renumber=True)¶ Initializes the distributed graph from the dask_cudf.DataFrame edgelist. Undirected Graphs are not currently supported.
By default, renumbering is enabled to map the source and destination vertices into an index in the range [0, V) where V is the number of vertices. If the input vertices are a single column of integers in the range [0, V), renumbering can be disabled and the original external vertex ids will be used.
 Parameters
 input_ddfdask_cudf.DataFrame
The edgelist as a dask_cudf.DataFrame
 sourcestr
source argument is source column name
 destinationstr
destination argument is destination column name.
 edge_attrstr
edge_attr argument is the weights column name.
 renumberbool
If source and destination indices are not in range 0 to V where V is number of vertices, renumber argument should be True.

get_two_hop_neighbors
()¶ Compute vertex pairs that are two hops apart. The resulting pairs are sorted before returning.
 Returns
 dfcudf.DataFrame
 df[‘first’]cudf.Series
the first vertex id of a pair, if an external vertex id is defined by only one column
 df[‘second’]cudf.Series
the second vertex id of a pair, if an external vertex id is defined by only one column
 df[‘*_first’]cudf.Series
the first vertex id of a pair, column 0 of the external vertex id will be represented as ‘0_first’, column 1 as ‘1_first’, etc.
 df[‘*_second’]cudf.Series
the second vertex id of a pair, column 0 of the external vertex id will be represented as ‘0_first’, column 1 as ‘1_first’, etc.

has_edge
(u, v)¶ Returns True if the graph contains the edge (u,v).

has_node
(n)¶ Returns True if the graph contains the node n.

in_degree
(vertex_subset=None)¶ Compute vertex indegree. Vertex indegree is the number of edges pointing into the vertex. By default, this method computes vertex degrees for the entire set of vertices. If vertex_subset is provided, this method optionally filters out all but those listed in vertex_subset.
 Parameters
 vertex_subsetcudf.Series or iterable container, optional
A container of vertices for displaying corresponding indegree. If not set, degrees are computed for the entire set of vertices.
 Returns
 dfcudf.DataFrame
GPU DataFrame of size N (the default) or the size of the given vertices (vertex_subset) containing the in_degree. The ordering is relative to the adjacency list, or that given by the specified vertex_subset. df[‘vertex’] : cudf.Series
The vertex IDs (will be identical to vertex_subset if specified).
 df[‘degree’]cudf.Series
The computed indegree of the corresponding vertex.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> df = G.in_degree([0,9,12])

is_bipartite
()¶ Checks if Graph is bipartite. This solely relies on the user call of add_nodes_from with the bipartite parameter. This does not parse the graph to check if it is bipartite.

is_directed
()¶

is_multipartite
()¶ Checks if Graph is multipartite. This solely relies on the user call of add_nodes_from with the partition parameter. This does not parse the graph to check if it is multipartite.

lookup_internal_vertex_id
(df, column_name=None)¶ Given a DataFrame containing external vertex ids in the identified columns, or a Series containing external vertex ids, return a Series with the internal vertex ids.
Note that this function does not guarantee order in single GPU mode, and does not guarantee order or partitioning in multiGPU mode.
 Parameters
 df: cudf.DataFrame, cudf.Series, dask_cudf.DataFrame, dask_cudf.Series
A DataFrame containing external vertex identifiers that will be converted into internal vertex identifiers.
 column_name: (optional) string
Name of the column containing the external vertex ids
 Returns
 seriescudf.Series or dask_cudf.Series
The internal vertex identifiers

neighbors
(n)¶

nodes
()¶ Returns all the nodes in the graph as a cudf.Series

number_of_edges
(directed_edges=False)¶ Get the number of edges in the graph.

number_of_nodes
()¶ An alias of number_of_vertices(). This function is added for NetworkX compatibility.

number_of_vertices
()¶ Get the number of nodes in the graph.

out_degree
(vertex_subset=None)¶ Compute vertex outdegree. Vertex outdegree is the number of edges pointing out from the vertex. By default, this method computes vertex degrees for the entire set of vertices. If vertex_subset is provided, this method optionally filters out all but those listed in vertex_subset.
 Parameters
 vertex_subsetcudf.Series or iterable container, optional
A container of vertices for displaying corresponding outdegree. If not set, degrees are computed for the entire set of vertices.
 Returns
 dfcudf.DataFrame
GPU DataFrame of size N (the default) or the size of the given vertices (vertex_subset) containing the out_degree. The ordering is relative to the adjacency list, or that given by the specified vertex_subset. df[‘vertex’] : cudf.Series
The vertex IDs (will be identical to vertex_subset if specified).
 df[‘degree’]cudf.Series
The computed outdegree of the corresponding vertex.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> df = G.out_degree([0,9,12])

sets
()¶ Returns the bipartite set of nodes. This solely relies on the user’s call of add_nodes_from with the bipartite parameter. This does not parse the graph to compute bipartite sets. If bipartite argument was not provided during add_nodes_from(), it raise an exception that the graph is not bipartite.

to_directed
()¶ Return a directed representation of the graph. This function sets the type of graph as DiGraph() and returns the directed view.
 Returns
 GDiGraph
A directed graph with the same nodes, and each edge (u,v,weights) replaced by two directed edges (u,v,weights) and (v,u,weights).
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, '0', '1') >>> DiG = G.to_directed()

to_undirected
()¶ Return an undirected copy of the graph.
 Returns
 GGraph
A undirected graph with the same nodes, and each directed edge (u,v,weights) replaced by an undirected edge (u,v,weights).
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> DiG = cugraph.DiGraph() >>> DiG.from_cudf_edgelist(M, '0', '1') >>> G = DiG.to_undirected()

class
transposedAdjList
(offsets, indices, value=None)¶

unrenumber
(df, column_name, preserve_order=False)¶ Given a DataFrame containing internal vertex ids in the identified column, replace this with external vertex ids. If the renumbering is from a single column, the output dataframe will use the same name for the external vertex identifiers. If the renumbering is from a multicolumn input, the output columns will be labeled 0 through n1 with a suffix of _column_name.
Note that this function does not guarantee order in single GPU mode, and does not guarantee order or partitioning in multiGPU mode. If you wish to preserve ordering, add an index column to df and sort the return by that index column.
 Parameters
 df: cudf.DataFrame or dask_cudf.DataFrame
A DataFrame containing internal vertex identifiers that will be converted into external vertex identifiers.
 column_name: string
Name of the column containing the internal vertex id.
 preserve_order: (optional) bool
If True, preserve the order of the rows in the output DataFrame to match the input DataFrame
 Returns
 dfcudf.DataFrame or dask_cudf.DataFrame
The original DataFrame columns exist unmodified. The external vertex identifiers are added to the DataFrame, the internal vertex identifier column is removed from the dataframe.

view_adj_list
()¶ Display the adjacency list. Compute it if needed.
 Returns
 offset_colcudf.Series
This cudf.Series wraps a gdf_column of size V + 1 (V: number of vertices). The gdf column contains the offsets for the vertices in this graph. Offsets are in the range [0, E] (E: number of edges).
 index_colcudf.Series
This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the destination index for each edge. Destination indices are in the range [0, V) (V: number of vertices).
 value_colcudf.Series or
None
This pointer is
None
for unweighted graphs. For weighted graphs, this cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the weight value for each edge. The expected type of the gdf_column element is floating point number.

view_edge_list
()¶ Display the edge list. Compute it if needed.
NOTE: If the graph is of type Graph() then the displayed undirected edges are the same as displayed by networkx Graph(), but the direction could be different i.e. an edge displayed by cugraph as (src, dst) could be displayed as (dst, src) by networkx.
cugraph.Graph stores symmetrized edgelist internally. For displaying undirected edgelist for a Graph the upper trianglar matrix of the symmetrized edgelist is returned.
networkx.Graph renumbers the input and stores the upper triangle of this renumbered input. Since the internal renumbering of networx and cugraph is different, the upper triangular matrix of networkx renumbered input may not be the same as cugraph’s upper trianglar matrix of the symmetrized edgelist. Hence the displayed source and destination pairs in both will represent the same edge but node values could be swapped.
 Returns
 edgelist_dfcudf.DataFrame
This cudf.DataFrame wraps source, destination and weight gdf_column of size E (E: number of edges) The ‘src’ column contains the source index for each edge. Source indices are in the range [0, V) (V: number of vertices). The ‘dst’ column contains the destination index for each edge. Destination indices are in the range [0, V) (V: number of vertices). For weighted graphs, dataframe contains ‘weight’ column containing the weight value for each edge.

view_transposed_adj_list
()¶ Display the transposed adjacency list. Compute it if needed.
 Returns
 offset_colcudf.Series
This cudf.Series wraps a gdf_column of size V + 1 (V: number of vertices). The gdf column contains the offsets for the vertices in this graph. Offsets are in the range [0, E] (E: number of edges).
 index_colcudf.Series
This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the destination index for each edge. Destination indices are in the range [0, V) (V: number of vertices).
 value_colcudf.Series or
None
This pointer is
None
for unweighted graphs. For weighted graphs, this cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the weight value for each edge. The expected type of the gdf_column element is floating point number.

class
Renumbering¶
Symmetrize¶

cugraph.structure.symmetrize.
symmetrize
(source_col, dest_col, value_col=None)¶ Take a COO set of source destination pairs along with associated values and create a new COO set of source destination pairs along with values where all edges exist in both directions.
Return from this call will be a COO stored as two cudf Series  the symmetrized source column and the symmetrized dest column, along with an optional cudf Series containing the associated values (only if the values are passed in).
 Parameters
 source_colcudf.Series
This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the source index for each edge. Source indices must be an integer type.
 dest_colcudf.Series
This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the destination index for each edge. Destination indices must be an integer type.
 value_colcudf.Series (optional)
This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains values associated with this edge. For this function the values can be any type, they are not examined, just copied.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> values = cudf.Series(M['2']) >>> src, dst, val = cugraph.symmetrize(sources, destinations, values) >>> G = cugraph.Graph() >>> G.add_edge_list(src, dst, val)

cugraph.structure.symmetrize.
symmetrize_df
(df, src_name, dst_name)¶ Take a COO stored in a DataFrame, along with the column names of the source and destination columns and create a new data frame using the same column names that symmetrize the graph so that all edges appear in both directions.
Note that if other columns exist in the data frame (e.g. edge weights) the other columns will also be replicated. That is, if (u,v,data) represents the source value (u), destination value (v) and some set of other columns (data) in the input data, then the output data will contain both (u,v,data) and (v,u,data) with matching data.
If (u,v,data1) and (v,u,data2) exist in the input data where data1 != data2 then this code will arbitrarily pick the smaller data element to keep, if this is not desired then the caller should should correct the data prior to calling symmetrize.
 Parameters
 dfcudf.DataFrame
Input data frame containing COO. Columns should contain source ids, destination ids and any properties associated with the edges.
 src_namestring
Name of the column in the data frame containing the source ids
 dst_namestring
Name of the column in the data frame containing the destination ids
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sym_df = cugraph.symmetrize(M, '0', '1') >>> G = cugraph.Graph() >>> G.add_edge_list(sym_df['0]', sym_df['1'], sym_df['2'])
Conversion from Other Formats¶

cugraph.structure.convert_matrix.
from_cudf_edgelist
(df, source='source', destination='destination', edge_attr=None, create_using=<class 'cugraph.structure.graph.Graph'>, renumber=True)¶ Return a new graph created from the edge list representaion. This function is added for NetworkX compatibility (this function is a RAPIDS version of NetworkX’s from_pandas_edge_list()).
 Parameters
 dfcudf.DataFrame
This cudf.DataFrame contains columns storing edge source vertices, destination (or target following NetworkX’s terminology) vertices, and (optional) weights.
 sourcestring or integer
This is used to index the source column.
 targetstring or integer
This is used to index the destination (or target following NetworkX’s terminology) column.
 weightstring or integer, optional
This pointer can be
None
. If not, this is used to index the weight column.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G = cugraph.from_cudf_edgelist(M, source='0', target='1', weight='2')
Centrality¶
Betweenness Centrality¶

cugraph.centrality.betweenness_centrality.
betweenness_centrality
(G, k=None, normalized=True, weight=None, endpoints=False, seed=None, result_dtype=<class 'numpy.float64'>)¶ Compute the betweenness centrality for all nodes of the graph G from a sample of ‘k’ sources. CuGraph does not currently support the ‘endpoints’ and ‘weight’ parameters as seen in the corresponding networkX call.
 Parameters
 GcuGraph.Graph
cuGraph graph descriptor with connectivity information. The graph can be either directed (DiGraph) or undirected (Graph). Weights in the graph are ignored, the current implementation uses BFS traversals. Use weight parameter if weights need to be considered (currently not supported)
 kint or list or None, optional, default=None
If k is not None, use k node samples to estimate betweenness. Higher values give better approximation If k is a list, use the content of the list for estimation: the list should contain vertices identifiers. If k is None (the default), all the vertices are used to estimate betweenness. Vertices obtained through sampling or defined as a list will be used as sources for traversals inside the algorithm.
 normalizedbool, optional
Default is True. If true, the betweenness values are normalized by 2 / ((n  1) * (n  2)) for Graphs (undirected), and 1 / ((n  1) * (n  2)) for DiGraphs (directed graphs) where n is the number of nodes in G. Normalization will ensure that values are in [0, 1], this normalization scales for the highest possible value where one node is crossed by every single shortest path.
 weightcudf.DataFrame, optional, default=None
Specifies the weights to be used for each edge. Should contain a mapping between edges and weights. (Not Supported)
 endpointsbool, optional, default=False
If true, include the endpoints in the shortest path counts. (Not Supported)
 seedoptional
if k is specified and k is an integer, use seed to initialize the random number generator. Using None as seed relies on random.seed() behavior: using current system time If k is either None or list: seed parameter is ignored
 result_dtypenp.float32 or np.float64, optional, default=np.float64
Indicate the data type of the betweenness centrality scores
 Returns
 dfcudf.DataFrame
GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding betweenness centrality values. Please note that the resulting the ‘vertex’ column might not be in ascending order.
 df[‘vertex’]cudf.Series
Contains the vertex identifiers
 df[‘betweenness_centrality’]cudf.Series
Contains the betweenness centrality of vertices
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, source='0', destination='1') >>> bc = cugraph.betweenness_centrality(G)

cugraph.centrality.betweenness_centrality.
edge_betweenness_centrality
(G, k=None, normalized=True, weight=None, seed=None, result_dtype=<class 'numpy.float64'>)¶ Compute the edge betweenness centrality for all edges of the graph G from a sample of ‘k’ sources. CuGraph does not currently support the ‘weight’ parameter as seen in the corresponding networkX call.
 Parameters
 GcuGraph.Graph
cuGraph graph descriptor with connectivity information. The graph can be either directed (DiGraph) or undirected (Graph). Weights in the graph are ignored, the current implementation uses BFS traversals. Use weight parameter if weights need to be considered (currently not supported)
 kint or list or None, optional, default=None
If k is not None, use k node samples to estimate betweenness. Higher values give better approximation If k is a list, use the content of the list for estimation: the list should contain vertices identifiers. Vertices obtained through sampling or defined as a list will be used as sources for traversals inside the algorithm.
 normalizedbool, optional
Default is True. If true, the betweenness values are normalized by 2 / (n * (n  1)) for Graphs (undirected), and 1 / (n * (n  1)) for DiGraphs (directed graphs) where n is the number of nodes in G. Normalization will ensure that values are in [0, 1], this normalization scales for the highest possible value where one edge is crossed by every single shortest path.
 weightcudf.DataFrame, optional, default=None
Specifies the weights to be used for each edge. Should contain a mapping between edges and weights. (Not Supported)
 seedoptional
if k is specified and k is an integer, use seed to initialize the random number generator. Using None as seed relies on random.seed() behavior: using current system time If k is either None or list: seed parameter is ignored
 result_dtypenp.float32 or np.float64, optional, default=np.float64
Indicate the data type of the betweenness centrality scores Using double automatically switch implementation to “default”
 Returns
 dfcudf.DataFrame
GPU data frame containing three cudf.Series of size E: the vertex identifiers of the sources, the vertex identifies of the destinations and the corresponding betweenness centrality values. Please note that the resulting the ‘src’, ‘dst’ column might not be in ascending order.
 df[‘src’]cudf.Series
Contains the vertex identifiers of the source of each edge
 df[‘dst’]cudf.Series
Contains the vertex identifiers of the destination of each edge
 df[‘edge_betweenness_centrality’]cudf.Series
Contains the betweenness centrality of edges
When using undirected graphs, ‘src’ and ‘dst’ only contains elements such that ‘src’ < ‘dst’, which might differ from networkx and user’s input. Namely edge (1 > 0) is transformed into (0 > 1) but contains the betweenness centrality of edge (1 > 0).
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, source='0', destination='1') >>> ebc = cugraph.edge_betweenness_centrality(G)
Katz Centrality¶

cugraph.centrality.katz_centrality.
katz_centrality
(G, alpha=None, max_iter=100, tol=1e06, nstart=None, normalized=True)¶ Compute the Katz centrality for the nodes of the graph G. cuGraph does not currently support the ‘beta’ and ‘weight’ parameters as seen in the corresponding networkX call. This implementation is based on a relaxed version of Katz defined by Foster with a reduced computational complexity of O(n+m)
Foster, K.C., Muth, S.Q., Potterat, J.J. et al. Computational & Mathematical Organization Theory (2001) 7: 275. https://doi.org/10.1023/A:1013470632383
 Parameters
 GcuGraph.Graph
cuGraph graph descriptor with connectivity information. The graph can contain either directed (DiGraph) or undirected edges (Graph).
 alphafloat
Attenuation factor defaulted to None. If alpha is not specified then it is internally calculated as 1/(degree_max) where degree_max is the maximum out degree. NOTE : The maximum acceptable value of alpha for convergence alpha_max = 1/(lambda_max) where lambda_max is the largest eigenvalue of the graph. Since lambda_max is always lesser than or equal to degree_max for a graph, alpha_max will always be greater than or equal to (1/degree_max). Therefore, setting alpha to (1/degree_max) will guarantee that it will never exceed alpha_max thus in turn fulfilling the requirement for convergence.
 max_iterint
The maximum number of iterations before an answer is returned. This can be used to limit the execution time and do an early exit before the solver reaches the convergence tolerance. If this value is lower or equal to 0 cuGraph will use the default value, which is 100.
 tolerancefloat
Set the tolerance the approximation, this parameter should be a small magnitude value. The lower the tolerance the better the approximation. If this value is 0.0f, cuGraph will use the default value which is 1.0e6. Setting too small a tolerance can lead to nonconvergence due to numerical roundoff. Usually values between 1e2 and 1e6 are acceptable.
 nstartcudf.Dataframe
GPU Dataframe containing the initial guess for katz centrality.
 nstart[‘vertex’]cudf.Series
Contains the vertex identifiers
 nstart[‘values’]cudf.Series
Contains the katz centrality values of vertices
 normalizedbool
If True normalize the resulting katz centrality values
 Returns
 dfcudf.DataFrame
GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding katz centrality values.
 df[‘vertex’]cudf.Series
Contains the vertex identifiers
 df[‘katz_centrality’]cudf.Series
Contains the katz centrality of vertices
Examples
>>> gdf = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(gdf, source='0', destination='1') >>> kc = cugraph.katz_centrality(G)
Community¶
Leiden¶

cugraph.community.leiden.
leiden
(input_graph, max_iter=100, resolution=1.0)¶ Compute the modularity optimizing partition of the input graph using the Leiden algorithm
It uses the Louvain method described in:
Traag, V. A., Waltman, L., & van Eck, N. J. (2019). From Louvain to Leiden: guaranteeing wellconnected communities. Scientific reports, 9(1), 5233. doi: 10.1038/s4159801941695z
 Parameters
 input_graphcugraph.Graph
cuGraph graph descriptor of type Graph
The adjacency list will be computed if not already present.
 max_iterinteger
This controls the maximum number of levels/iterations of the Leiden algorithm. When specified the algorithm will terminate after no more than the specified number of iterations. No error occurs when the algorithm terminates early in this manner.
 resolution: float/double, optional
Called gamma in the modularity formula, this changes the size of the communities. Higher resolutions lead to more smaller communities, lower resolutions lead to fewer larger communities. Defaults to 1.
 Returns
 partscudf.DataFrame
GPU data frame of size V containing two columns the vertex id and the partition id it is assigned to.
 df[‘vertex’]cudf.Series
Contains the vertex identifiers
 df[‘partition’]cudf.Series
Contains the partition assigned to the vertices
 modularity_scorefloat
a floating point number containing the global modularity score of the partitioning.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter = ' ', dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, source='0', destination='1') >>> parts, modularity_score = cugraph.leiden(G)
Louvain¶

cugraph.community.louvain.
louvain
(input_graph, max_iter=100, resolution=1.0)¶ Compute the modularity optimizing partition of the input graph using the Louvain method
It uses the Louvain method described in:
VD Blondel, JL Guillaume, R Lambiotte and E Lefebvre: Fast unfolding of community hierarchies in large networks, J Stat Mech P10008 (2008), http://arxiv.org/abs/0803.0476
 Parameters
 input_graphcugraph.Graph
cuGraph graph descriptor of type Graph
The adjacency list will be computed if not already present.
 max_iterinteger
This controls the maximum number of levels/iterations of the Louvain algorithm. When specified the algorithm will terminate after no more than the specified number of iterations. No error occurs when the algorithm terminates early in this manner.
 resolution: float/double, optional
Called gamma in the modularity formula, this changes the size of the communities. Higher resolutions lead to more smaller communities, lower resolutions lead to fewer larger communities. Defaults to 1.
 Returns
 partscudf.DataFrame
GPU data frame of size V containing two columns the vertex id and the partition id it is assigned to.
 df[‘vertex’]cudf.Series
Contains the vertex identifiers
 df[‘partition’]cudf.Series
Contains the partition assigned to the vertices
 modularity_scorefloat
a floating point number containing the global modularity score of the partitioning.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter = ' ', dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, source='0', destination='1') >>> parts, modularity_score = cugraph.louvain(G)
KTruss¶

cugraph.community.ktruss_subgraph.
ktruss_subgraph
(G, k, use_weights=True)¶ Returns the KTruss subgraph of a graph for a specific k.
The ktruss of a graph is a subgraph where each edge is part of at least (k−2) triangles. Ktrusses are used for finding tighlty knit groups of vertices in a graph. A ktruss is a relaxation of a kclique in the graph and was define in [1]. Finding cliques is computationally demanding and finding the maximal kclique is known to be NPHard.
In contrast, finding a ktruss is computationally tractable as its key building block, namely triangle counting counting, can be executed in polnymomial time.Typically, it takes many iterations of triangle counting to find the ktruss of a graph. Yet these iterations operate on a weakly monotonically shrinking graph. Therefore, finding the ktruss of a graph can be done in a fairly reasonable amount of time. The solution in cuGraph is based on a GPU algorithm first shown in [2] and uses the triangle counting algorithm from [3].
[1] Cohen, J., “Trusses: Cohesive subgraphs for social network analysis” National security agency technical report, 2008
[2] O. Green, J. Fox, E. Kim, F. Busato, et al. “Quickly Finding a Truss in a Haystack” IEEE High Performance Extreme Computing Conference (HPEC), 2017 https://doi.org/10.1109/HPEC.2017.8091038
[3] O. Green, P. Yalamanchili, L.M. Munguia, “Fast Triangle Counting on GPU” Irregular Applications: Architectures and Algorithms (IA3), 2014
 Parameters
 GcuGraph.Graph
cuGraph graph descriptor with connectivity information. kTrusses are defined for only undirected graphs as they are defined for undirected triangle in a graph.
 kint
The desired k to be used for extracting the ktruss subgraph.
 use_weightsBool
whether the output should contain the edge weights if G has them
 Returns
 G_trusscuGraph.Graph
A cugraph graph descriptor with the ktruss subgraph for the given k.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, source='0', destination='1') >>> k_subgraph = cugraph.ktruss_subgraph(G, 3)
ECG¶

cugraph.community.ecg.
ecg
(input_graph, min_weight=0.05, ensemble_size=16)¶ Compute the Ensemble Clustering for Graphs (ECG) partition of the input graph. ECG runs truncated Louvain on an ensemble of permutations of the input graph, then uses the ensemble partitions to determine weights for the input graph. The final result is found by running full Louvain on the input graph using the determined weights.
See https://arxiv.org/abs/1809.05578 for further information.
 Parameters
 input_graphcugraph.Graph
cuGraph graph descriptor, should contain the connectivity information and weights. The adjacency list will be computed if not already present.
 min_weightfloating point
The minimum value to assign as an edgeweight in the ECG algorithm. It should be a value in the range [0,1] usually left as the default value of .05
 ensemble_sizeinteger
The number of graph permutations to use for the ensemble. The default value is 16, larger values may produce higher quality partitions for some graphs.
 Returns
 partscudf.DataFrame
GPU data frame of size V containing two columns, the vertex id and the partition id it is assigned to.
 df[‘vertex’]cudf.Series
Contains the vertex identifiers
 df[‘partition’]cudf.Series
Contains the partition assigned to the vertices
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter = ' ', dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr='2') >>> parts = cugraph.ecg(G)
Spectral Clustering¶

cugraph.community.spectral_clustering.
analyzeClustering_edge_cut
(G, n_clusters, clustering)¶ Compute the edge cut score for a partitioning/clustering
 Parameters
 Gcugraph.Graph
cuGraph graph descriptor
 n_clustersinteger
Specifies the number of clusters in the given clustering
 clusteringcudf.Series
The cluster assignment to analyze.
 Returns
 scorefloat
The computed edge cut score
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter = ' ', dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr=None) >>> df = cugraph.spectralBalancedCutClustering(G, 5) >>> score = cugraph.analyzeClustering_edge_cut(G, 5, df['cluster'])

cugraph.community.spectral_clustering.
analyzeClustering_modularity
(G, n_clusters, clustering)¶ Compute the modularity score for a partitioning/clustering
 Parameters
 Gcugraph.Graph
cuGraph graph descriptor. This graph should have edge weights.
 n_clustersinteger
Specifies the number of clusters in the given clustering
 clusteringcudf.Series
The cluster assignment to analyze.
 Returns
 scorefloat
The computed modularity score
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter = ' ', dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr='2') >>> df = cugraph.spectralBalancedCutClustering(G, 5) >>> score = cugraph.analyzeClustering_modularity(G, 5, df['cluster'])

cugraph.community.spectral_clustering.
analyzeClustering_ratio_cut
(G, n_clusters, clustering)¶ Compute the ratio cut score for a partitioning/clustering
 Parameters
 Gcugraph.Graph
cuGraph graph descriptor. This graph should have edge weights.
 n_clustersinteger
Specifies the number of clusters in the given clustering
 clusteringcudf.Series
The cluster assignment to analyze.
 Returns
 scorefloat
The computed ratio cut score
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter = ' ', dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr='2') >>> df = cugraph.spectralBalancedCutClustering(G, 5) >>> score = cugraph.analyzeClustering_ratio_cut(G, 5, df['cluster'])

cugraph.community.spectral_clustering.
spectralBalancedCutClustering
(G, num_clusters, num_eigen_vects=2, evs_tolerance=1e05, evs_max_iter=100, kmean_tolerance=1e05, kmean_max_iter=100)¶ Compute a clustering/partitioning of the given graph using the spectral balanced cut method.
 Parameters
 Gcugraph.Graph
cuGraph graph descriptor
 num_clustersinteger
Specifies the number of clusters to find
 num_eigen_vectsinteger
Specifies the number of eigenvectors to use. Must be lower or equal to num_clusters.
 evs_tolerance: float
Specifies the tolerance to use in the eigensolver Default is 0.00001
 evs_max_iter: integer
Specifies the maximum number of iterations for the eigensolver Default is 100
 kmean_tolerance: float
Specifies the tolerance to use in the kmeans solver Default is 0.00001
 kmean_max_iter: integer
Specifies the maximum number of iterations for the kmeans solver Default is 100
 Returns
 dfcudf.DataFrame
GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding cluster assignments.
 df[‘vertex’]cudf.Series
contains the vertex identifiers
 df[‘cluster’]cudf.Series
contains the cluster assignments
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter = ' ', dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, source='0', destination='1') >>> df = cugraph.spectralBalancedCutClustering(G, 5)

cugraph.community.spectral_clustering.
spectralModularityMaximizationClustering
(G, num_clusters, num_eigen_vects=2, evs_tolerance=1e05, evs_max_iter=100, kmean_tolerance=1e05, kmean_max_iter=100)¶ Compute a clustering/partitioning of the given graph using the spectral modularity maximization method.
 Parameters
 Gcugraph.Graph
cuGraph graph descriptor. This graph should have edge weights.
 num_clustersinteger
Specifies the number of clusters to find
 num_eigen_vectsinteger
Specifies the number of eigenvectors to use. Must be lower or equal to num_clusters
 evs_tolerance: float
Specifies the tolerance to use in the eigensolver Default is 0.00001
 evs_max_iter: integer
Specifies the maximum number of iterations for the eigensolver Default is 100
 kmean_tolerance: float
Specifies the tolerance to use in the kmeans solver Default is 0.00001
 kmean_max_iter: integer
Specifies the maximum number of iterations for the kmeans solver Default is 100
 Returns
 dfcudf.DataFrame
 df[‘vertex’]cudf.Series
contains the vertex identifiers
 df[‘cluster’]cudf.Series
contains the cluster assignments
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter = ' ', dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr='2') >>> df = cugraph.spectralModularityMaximizationClustering(G, 5)
Subgraph Extraction¶

cugraph.community.subgraph_extraction.
subgraph
(G, vertices)¶ Compute a subgraph of the existing graph including only the specified vertices. This algorithm works for both directed and undirected graphs, it does not actually traverse the edges, simply pulls out any edges that are incident on vertices that are both contained in the vertices list.
 Parameters
 Gcugraph.Graph
cuGraph graph descriptor
 verticescudf.Series
Specifies the vertices of the induced subgraph
 Returns
 Sgcugraph.Graph
A graph object containing the subgraph induced by the given vertex set.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter = ' ', dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, source='0', destination='1') >>> verts = numpy.zeros(3, dtype=numpy.int32) >>> verts[0] = 0 >>> verts[1] = 1 >>> verts[2] = 2 >>> sverts = cudf.Series(verts) >>> Sg = cugraph.subgraph(G, sverts)
Triangle Counting¶

cugraph.community.triangle_count.
triangles
(G)¶ Compute the triangle (number of cycles of length three) count of the input graph.
 Parameters
 Gcugraph.graph
cuGraph graph descriptor, should contain the connectivity information, (edge weights are not used in this algorithm)
 Returns
 countint64
A 64 bit integer whose value gives the number of triangles in the graph.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter = ' ', dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, source='0', destination='1') >>> count = cugraph.triangles(G)
Components¶
Connected Components¶

cugraph.components.connectivity.
strongly_connected_components
(G)¶ Generate the Stronlgly Connected Components and attach a component label to each vertex.
 Parameters
 Gcugraph.Graph
cuGraph graph descriptor, should contain the connectivity information as an edge list (edge weights are not used for this algorithm). The graph can be either directed or undirected where an undirected edge is represented by a directed edge in both directions. The adjacency list will be computed if not already present. The number of vertices should fit into a 32b int.
 Returns
 dfcudf.DataFrame
df[‘labels’][i] gives the label id of the i’th vertex df[‘vertices’][i] gives the vertex id of the i’th vertex
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter = ' ', dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr=None) >>> df = cugraph.strongly_connected_components(G)

cugraph.components.connectivity.
weakly_connected_components
(G)¶ Generate the Weakly Connected Components and attach a component label to each vertex.
 Parameters
 Gcugraph.Graph
cuGraph graph descriptor, should contain the connectivity information as an edge list (edge weights are not used for this algorithm). Currently, the graph should be undirected where an undirected edge is represented by a directed edge in both directions. The adjacency list will be computed if not already present. The number of vertices should fit into a 32b int.
 Returns
 dfcudf.DataFrame
df[‘labels’][i] gives the label id of the i’th vertex df[‘vertices’][i] gives the vertex id of the i’th vertex
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter = ' ', dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr=None) >>> df = cugraph.weakly_connected_components(G)
Cores¶
KCore¶

cugraph.cores.k_core.
k_core
(G, k=None, core_number=None)¶ Compute the kcore of the graph G based on the out degree of its nodes. A kcore of a graph is a maximal subgraph that contains nodes of degree k or more. This call does not support a graph with selfloops and parallel edges.
 Parameters
 GcuGraph.Graph
cuGraph graph descriptor with connectivity information. The graph should contain undirected edges where undirected edges are represented as directed edges in both directions. While this graph can contain edge weights, they don’t participate in the calculation of the kcore.
 kint, optional
Order of the core. This value must not be negative. If set to None, the main core is returned.
 core_numbercudf.DataFrame, optional
Precomputed core number of the nodes of the graph G containing two cudf.Series of size V: the vertex identifiers and the corresponding core number values. If set to None, the core numbers of the nodes are calculated internally.
 core_number[‘vertex’]cudf.Series
Contains the vertex identifiers
 core_number[‘values’]cudf.Series
Contains the core number of vertices
 Returns
 KCoreGraphcuGraph.Graph
K Core of the input graph
Examples
>>> gdf = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(gdf, source='0', destination='1') >>> KCoreGraph = cugraph.k_core(G)
Core Number¶

cugraph.cores.core_number.
core_number
(G)¶ Compute the core numbers for the nodes of the graph G. A kcore of a graph is a maximal subgraph that contains nodes of degree k or more. A node has a core number of k if it belongs a kcore but not to k+1core. This call does not support a graph with selfloops and parallel edges.
 Parameters
 graphcuGraph.Graph
cuGraph graph descriptor with connectivity information. The graph should contain undirected edges where undirected edges are represented as directed edges in both directions. While this graph can contain edge weights, they don’t participate in the calculation of the core numbers.
 Returns
 dfcudf.DataFrame
GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding core number values.
 df[‘vertex’]cudf.Series
Contains the vertex identifiers
 df[‘core_number’]cudf.Series
Contains the core number of vertices
Examples
>>> gdf = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(gdf, source='0', destination='1') >>> cn = cugraph.core_number(G)
Layout¶
Force Atlas 2¶

cugraph.layout.force_atlas2.
force_atlas2
(input_graph, max_iter=500, pos_list=None, outbound_attraction_distribution=True, lin_log_mode=False, prevent_overlapping=False, edge_weight_influence=1.0, jitter_tolerance=1.0, barnes_hut_optimize=True, barnes_hut_theta=0.5, scaling_ratio=2.0, strong_gravity_mode=False, gravity=1.0, verbose=False, callback=None)¶ ForceAtlas2 is a continuous graph layout algorithm for handy network visualization.
NOTE: Peak memory allocation occurs at 17*V.
 Parameters
 input_graphcugraph.Graph
cuGraph graph descriptor with connectivity information. Edge weights, if present, should be single or double precision floating point values.
 max_iterinteger
This controls the maximum number of levels/iterations of the Force Atlas algorithm. When specified the algorithm will terminate after no more than the specified number of iterations. No error occurs when the algorithm terminates early in this manner. Good shortterm quality can be achieved with 50100 iterations. Above 1000 iterations is discouraged.
 pos_list: cudf.DataFrame
Data frame with initial vertex positions containing two columns: ‘x’ and ‘y’ positions.
 outbound_attraction_distribution: bool
Distributes attraction along outbound edges. Hubs attract less and thus are pushed to the borders.
 lin_log_mode: bool
Switch Force Atlas model from linlin to linlog. Makes clusters more tight.
 prevent_overlapping: bool
Prevent nodes to overlap.
 edge_weight_influence: float
How much influence you give to the edges weight. 0 is “no influence” and 1 is “normal”.
 jitter_tolerance: float
How much swinging you allow. Above 1 discouraged. Lower gives less speed and more precision.
 barnes_hut_theta: float
Float between 0 and 1. Tradeoff for speed (1) vs accuracy (0) for Barnes Hut only.
 scaling_ratio: float
How much repulsion you want. More makes a more sparse graph. Switching from regular mode to LinLog mode needs a readjustment of the scaling parameter.
 gravityfloat
Attracts nodes to the center. Prevents islands from drifting away.
 verbose: bool
Output convergence info at each interation.
 callback: GraphBasedDimRedCallback
An instance of GraphBasedDimRedCallback class to intercept the internal state of positions while they are being trained. Example of callback usage:
 from cugraph.internals import GraphBasedDimRedCallback
 class CustomCallback(GraphBasedDimRedCallback):
 def on_preprocess_end(self, positions):
print(positions.copy_to_host())
 def on_epoch_end(self, positions):
print(positions.copy_to_host())
 def on_train_end(self, positions):
print(positions.copy_to_host())
 Returns
 poscudf.DataFrame
GPU data frame of size V containing three columns: the vertex identifiers and the x and y positions.
Link Analysis¶
Pagerank¶

cugraph.link_analysis.pagerank.
pagerank
(G, alpha=0.85, personalization=None, max_iter=100, tol=1e05, nstart=None)¶ Find the PageRank score for every vertex in a graph. cuGraph computes an approximation of the Pagerank eigenvector using the power method. The number of iterations depends on the properties of the network itself; it increases when the tolerance descreases and/or alpha increases toward the limiting value of 1. The user is free to use default values or to provide inputs for the initial guess, tolerance and maximum number of iterations.
 Parameters
 graphcugraph.Graph
cuGraph graph descriptor, should contain the connectivity information as an edge list (edge weights are not used for this algorithm). The transposed adjacency list will be computed if not already present.
 alphafloat
The damping factor alpha represents the probability to follow an outgoing edge, standard value is 0.85. Thus, 1.0alpha is the probability to “teleport” to a random vertex. Alpha should be greater than 0.0 and strictly lower than 1.0.
 personalizationcudf.Dataframe
GPU Dataframe containing the personalization information.
 personalization[‘vertex’]cudf.Series
Subset of vertices of graph for personalization
 personalization[‘values’]cudf.Series
Personalization values for vertices
 max_iterint
The maximum number of iterations before an answer is returned. This can be used to limit the execution time and do an early exit before the solver reaches the convergence tolerance. If this value is lower or equal to 0 cuGraph will use the default value, which is 100.
 tolerancefloat
Set the tolerance the approximation, this parameter should be a small magnitude value. The lower the tolerance the better the approximation. If this value is 0.0f, cuGraph will use the default value which is 1.0E5. Setting too small a tolerance can lead to nonconvergence due to numerical roundoff. Usually values between 0.01 and 0.00001 are acceptable.
 nstartcudf.Dataframe
GPU Dataframe containing the initial guess for pagerank.
 nstart[‘vertex’]cudf.Series
Subset of vertices of graph for initial guess for pagerank values
 nstart[‘values’]cudf.Series
Pagerank values for vertices
 Returns
 PageRankcudf.DataFrame
GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding PageRank values.
 df[‘vertex’]cudf.Series
Contains the vertex identifiers
 df[‘pagerank’]cudf.Series
Contains the PageRank score
Examples
>>> gdf = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(gdf, source='0', destination='1') >>> pr = cugraph.pagerank(G, alpha = 0.85, max_iter = 500, tol = 1.0e05)
Link Prediction¶
Jaccard Coefficient¶

cugraph.link_prediction.jaccard.
jaccard
(input_graph, vertex_pair=None)¶ Compute the Jaccard similarity between each pair of vertices connected by an edge, or between arbitrary pairs of vertices specified by the user. Jaccard similarity is defined between two sets as the ratio of the volume of their intersection divided by the volume of their union. In the context of graphs, the neighborhood of a vertex is seen as a set. The Jaccard similarity weight of each edge represents the strength of connection between vertices based on the relative similarity of their neighbors. If first is specified but second is not, or vice versa, an exception will be thrown.
NOTE: If the vertex_pair parameter is not specified then the behavior of cugraph.jaccard is different from the behavior of networkx.jaccard_coefficient.
cugraph.jaccard, in the absence of a specified vertex pair list, will use the edges of the graph to construct a vertex pair list and will return the jaccard coefficient for those vertex pairs.
networkx.jaccard_coefficient, in the absence of a specified vertex pair list, will return an upper triangular dense matrix, excluding the diagonal as well as vertex pairs that are directly connected by an edge in the graph, of jaccard coefficients. Technically, networkx returns a lazy iterator across this upper triangular matrix where the actual jaccard coefficient is computed when the iterator is dereferenced. Computing a dense matrix of results is not feasible if the number of vertices in the graph is large (100,000 vertices would result in 4.9 billion values in that iterator).
If your graph is small enough (or you have enough memory and patience) you can get the interesting (nonzero) values that are part of the networkx solution by doing the following:
>>> gdf = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(gdf, source='0', destination='1') >>> pairs = cugraph.get_two_hop_neighbors(G) >>> df = cugraph.jaccard(G, pairs)
But please remember that cugraph will fill the dataframe with the entire solution you request, so you’ll need enough memory to store the 2hop neighborhood dataframe.
 Parameters
 graphcugraph.Graph
cuGraph graph descriptor, should contain the connectivity information as an edge list (edge weights are not used for this algorithm). The graph should be undirected where an undirected edge is represented by a directed edge in both direction. The adjacency list will be computed if not already present.
 vertex_paircudf.DataFrame
A GPU dataframe consisting of two columns representing pairs of vertices. If provided, the jaccard coefficient is computed for the given vertex pairs. If the vertex_pair is not provided then the current implementation computes the jaccard coefficient for all adjacent vertices in the graph.
 Returns
 dfcudf.DataFrame
GPU data frame of size E (the default) or the size of the given pairs (first, second) containing the Jaccard weights. The ordering is relative to the adjacency list, or that given by the specified vertex pairs.
 df[‘source’]cudf.Series
The source vertex ID (will be identical to first if specified)
 df[‘destination’]cudf.Series
The destination vertex ID (will be identical to second if specified)
 df[‘jaccard_coeff’]cudf.Series
The computed Jaccard coefficient between the source and destination vertices
Examples
>>> gdf = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(gdf, source='0', destination='1') >>> df = cugraph.jaccard(G)

cugraph.link_prediction.wjaccard.
jaccard_w
(input_graph, weights, vertex_pair=None)¶ Compute the weighted Jaccard similarity between each pair of vertices connected by an edge, or between arbitrary pairs of vertices specified by the user. Jaccard similarity is defined between two sets as the ratio of the volume of their intersection divided by the volume of their union. In the context of graphs, the neighborhood of a vertex is seen as a set. The Jaccard similarity weight of each edge represents the strength of connection between vertices based on the relative similarity of their neighbors. If first is specified but second is not, or vice versa, an exception will be thrown.
 Parameters
 graphcugraph.Graph
cuGraph graph descriptor, should contain the connectivity information as an edge list (edge weights are not used for this algorithm). The adjacency list will be computed if not already present.
 weightscudf.Series
Specifies the weights to be used for each vertex.
 vertex_paircudf.DataFrame
A GPU dataframe consisting of two columns representing pairs of vertices. If provided, the jaccard coefficient is computed for the given vertex pairs, else, it is computed for all vertex pairs.
 Returns
 dfcudf.DataFrame
GPU data frame of size E (the default) or the size of the given pairs (first, second) containing the Jaccard weights. The ordering is relative to the adjacency list, or that given by the specified vertex pairs.
 df[‘source’]cudf.Series
The source vertex ID
 df[‘destination’]cudf.Series
The destination vertex ID
 df[‘jaccard_coeff’]cudf.Series
The computed weighted Jaccard coefficient between the source and destination vertices.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, source='0', destination='1') >>> df = cugraph.jaccard_w(G, M[2])
Overlap Coefficient¶

cugraph.link_prediction.overlap.
overlap
(input_graph, vertex_pair=None)¶ Compute the Overlap Coefficient between each pair of vertices connected by an edge, or between arbitrary pairs of vertices specified by the user. Overlap Coefficient is defined between two sets as the ratio of the volume of their intersection divided by the smaller of their two volumes. In the context of graphs, the neighborhood of a vertex is seen as a set. The Overlap Coefficient weight of each edge represents the strength of connection between vertices based on the relative similarity of their neighbors. If first is specified but second is not, or vice versa, an exception will be thrown.
 Parameters
 graphcugraph.Graph
cuGraph graph descriptor, should contain the connectivity information as an edge list (edge weights are not used for this algorithm). The adjacency list will be computed if not already present.
 vertex_paircudf.DataFrame
A GPU dataframe consisting of two columns representing pairs of vertices. If provided, the overlap coefficient is computed for the given vertex pairs, else, it is computed for all vertex pairs.
 Returns
 dfcudf.DataFrame
GPU data frame of size E (the default) or the size of the given pairs (first, second) containing the Overlap coefficients. The ordering is relative to the adjacency list, or that given by the specified vertex pairs.
 df[‘source’]cudf.Series
The source vertex ID (will be identical to first if specified).
 df[‘destination’]cudf.Series
The destination vertex ID (will be identical to second if specified).
 df[‘overlap_coeff’]cudf.Series
The computed Overlap coefficient between the source and destination vertices.
Examples
>>> gdf = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(gdf, source='0', destination='1') >>> df = cugraph.overlap(G)

cugraph.link_prediction.woverlap.
overlap_w
(input_graph, weights, vertex_pair=None)¶ Compute the weighted Overlap Coefficient between each pair of vertices connected by an edge, or between arbitrary pairs of vertices specified by the user. Overlap Coefficient is defined between two sets as the ratio of the volume of their intersection divided by the smaller of their volumes. In the context of graphs, the neighborhood of a vertex is seen as a set. The Overlap Coefficient weight of each edge represents the strength of connection between vertices based on the relative similarity of their neighbors. If first is specified but second is not, or vice versa, an exception will be thrown.
 Parameters
 input_graphcugraph.Graph
cuGraph graph descriptor, should contain the connectivity information as an edge list (edge weights are not used for this algorithm). The adjacency list will be computed if not already present.
 weightscudf.Series
Specifies the weights to be used for each vertex.
 vertex_paircudf.DataFrame
A GPU dataframe consisting of two columns representing pairs of vertices. If provided, the overlap coefficient is computed for the given vertex pairs, else, it is computed for all vertex pairs.
 Returns
 dfcudf.DataFrame
GPU data frame of size E (the default) or the size of the given pairs (first, second) containing the overlap coefficients. The ordering is relative to the adjacency list, or that given by the specified vertex pairs.
 df[‘source’]cudf.Series
The source vertex ID
 df[‘destination’]cudf.Series
The destination vertex ID
 df[‘overlap_coeff’]cudf.Series
The computed weighted Overlap coefficient between the source and destination vertices.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, source='0', destination='1') >>> df = cugraph.overlap_w(G, M[2])
Traversal¶
Breadthfirstsearch¶

cugraph.traversal.bfs.
bfs
(G, start, return_sp_counter=False)¶ Find the distances and predecessors for a breadth first traversal of a graph.
 Parameters
 Gcugraph.graph
cuGraph graph descriptor, should contain the connectivity information as an adjacency list.
 startInteger
The index of the graph vertex from which the traversal begins
 return_sp_counterbool, optional, default=False
Indicates if shortest path counters should be returned
 Returns
 dfcudf.DataFrame
df[‘vertex’][i] gives the vertex id of the i’th vertex
df[‘distance’][i] gives the path distance for the i’th vertex from the starting vertex
df[‘predecessor’][i] gives for the i’th vertex the vertex it was reached from in the traversal
df[‘sp_counter’][i] gives for the i’th vertex the number of shortest path leading to it during traversal (Only if retrun_sp_counter is True)
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, source='0', destination='1') >>> df = cugraph.bfs(G, 0)
Singlesourceshortestpath¶

cugraph.traversal.sssp.
filter_unreachable
(df)¶ Remove unreachable vertices from the result of SSSP or BFS
 Parameters
 dfcudf.DataFrame
cudf.DataFrame that is the output of SSSP or BFS
 Returns
 dffiltered cudf.DataFrame with only reachable vertices
df[‘vertex’][i] gives the vertex id of the i’th vertex. df[‘distance’][i] gives the path distance for the i’th vertex from the starting vertex. df[‘predecessor’][i] gives the vertex that was reached before the i’th vertex in the traversal.

cugraph.traversal.sssp.
sssp
(G, source)¶ Compute the distance and predecessors for shortest paths from the specified source to all the vertices in the graph. The distances column will store the distance from the source to each vertex. The predecessors column will store each vertex’s predecessor in the shortest path. Vertices that are unreachable will have a distance of infinity denoted by the maximum value of the data type and the predecessor set as 1. The source vertex’s predecessor is also set to 1. Graphs with negative weight cycles are not supported.
 Parameters
 graphcuGraph.Graph
cuGraph graph descriptor with connectivity information. Edge weights, if present, should be single or double precision floating point values.
 sourceint
Index of the source vertex.
 Returns
 dfcudf.DataFrame
df[‘vertex’][i] gives the vertex id of the i’th vertex. df[‘distance’][i] gives the path distance for the i’th vertex from the starting vertex. df[‘predecessor’][i] gives the vertex id of the vertex that was reached before the i’th vertex in the traversal.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G.from_cudf_edgelist(M, source='0', destination='1') >>> distances = cugraph.sssp(G, 0)