cuGraph API Reference

Structure

Graph

class cugraph.structure.graph.Graph(m_graph=None, edge_attr=None, symmetrized=False, bipartite=False, multi=False, dynamic=False)

Methods

clear(self)

Empty this graph.

degree(self[, vertex_subset])

Compute vertex degree.

degrees(self[, vertex_subset])

Compute vertex in-degree and out-degree.

delete_adj_list(self)

Delete the adjacency list.

delete_edge_list(self)

Delete the edge list.

edges(self)

Returns all the edges in the graph as a cudf.DataFrame containing sources and destinations.

from_cudf_adjlist(self, offset_col, index_col)

Initialize a graph from the adjacency list.

from_cudf_edgelist(self, input_df[, source, …])

Initialize a graph from the edge list.

get_two_hop_neighbors(self)

Compute vertex pairs that are two hops apart.

has_edge(self, u, v)

Returns True if the graph contains the edge (u,v).

has_node(self, n)

Returns True if the graph contains the node n.

in_degree(self[, vertex_subset])

Compute vertex in-degree.

nodes(self)

Returns all the nodes in the graph as a cudf.Series

number_of_edges(self)

Get the number of edges in the graph.

number_of_nodes(self)

An alias of number_of_vertices().

number_of_vertices(self)

Get the number of nodes in the graph.

out_degree(self[, vertex_subset])

Compute vertex out-degree.

to_directed(self)

Return a directed representation of the graph.

to_undirected(self)

Return an undirected copy of the graph.

view_adj_list(self)

Display the adjacency list.

view_edge_list(self)

Display the edge list.

view_transposed_adj_list(self)

Display the transposed adjacency list.

AdjList

EdgeList

add_adj_list

add_edge_list

is_directed

neighbors

transposedAdjList

class AdjList(offsets, indices, value=None)
class EdgeList(source, destination, edge_attr=None, renumber_map=None)
add_adj_list(self, offset_col, index_col, value_col=None)
add_edge_list(self, source, destination, value=None)
clear(self)

Empty this graph. This function is added for NetworkX compatibility.

degree(self, vertex_subset=None)

Compute vertex degree. By default, this method computes vertex degrees for the entire set of vertices. If vertex_subset is provided, this method optionally filters out all but those listed in vertex_subset.

Parameters
vertex_subsetcudf.Series or iterable container, optional

A container of vertices for displaying corresponding degree. If not set, degrees are computed for the entire set of vertices.

Returns
dfcudf.DataFrame

GPU data frame of size N (the default) or the size of the given vertices (vertex_subset) containing the degree. The ordering is relative to the adjacency list, or that given by the specified vertex_subset. df[‘vertex’] : cudf.Series

The vertex IDs (will be identical to vertex_subset if specified).

df[‘degree’]cudf.Series

The computed degree of the corresponding vertex.

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> sources = cudf.Series(M['0'])
>>> destinations = cudf.Series(M['1'])
>>> G = cugraph.Graph()
>>> G.add_edge_list(sources, destinations, None)
>>> df = G.degree([0,9,12])
degrees(self, vertex_subset=None)

Compute vertex in-degree and out-degree. By default, this method computes vertex degrees for the entire set of vertices. If vertex_subset is provided, this method optionally filters out all but those listed in vertex_subset.

Parameters
vertex_subsetcudf.Series or iterable container, optional

A container of vertices for displaying corresponding degree. If not set, degrees are computed for the entire set of vertices.

Returns
dfcudf.DataFrame
df[‘vertex’]cudf.Series

The vertex IDs (will be identical to vertex_subset if specified).

df[‘in_degree’]cudf.Series

The in-degree of the vertex.

df[‘out_degree’]cudf.Series

The out-degree of the vertex.

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> sources = cudf.Series(M['0'])
>>> destinations = cudf.Series(M['1'])
>>> G = cugraph.Graph()
>>> G.add_edge_list(sources, destinations, None)
>>> df = G.degrees([0,9,12])
delete_adj_list(self)

Delete the adjacency list.

delete_edge_list(self)

Delete the edge list.

edges(self)

Returns all the edges in the graph as a cudf.DataFrame containing sources and destinations. It does not return the edge weights. For viewing edges with weights use view_edge_list()

from_cudf_adjlist(self, offset_col, index_col, value_col=None)

Initialize a graph from the adjacency list. It is an error to call this method on an initialized Graph object. The passed offset_col and index_col arguments wrap gdf_column objects that represent a graph using the adjacency list format. If value_col is None, an unweighted graph is created. If value_col is not None, a weighted graph is created. If copy is False, this function stores references to the passed objects pointed by offset_col and index_col. If copy is True, this funcion stores references to the deep-copies of the passed objects pointed by offset_col and index_col. Undirected edges must be stored as directed edges in both directions.

Parameters
offset_colcudf.Series

This cudf.Series wraps a gdf_column of size V + 1 (V: number of vertices). The gdf column contains the offsets for the vertices in this graph. Offsets must be in the range [0, E] (E: number of edges).

index_colcudf.Series

This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the destination index for each edge. Destination indices must be in the range [0, V) (V: number of vertices).

value_colcudf.Series, optional

This pointer can be None. If not, this cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the weight value for each edge. The expected type of the gdf_column element is floating point number.

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> M = M.to_pandas()
>>> M = scipy.sparse.coo_matrix((M['2'],(M['0'],M['1'])))
>>> M = M.tocsr()
>>> offsets = cudf.Series(M.indptr)
>>> indices = cudf.Series(M.indices)
>>> G = cugraph.Graph()
>>> G.from_cudf_adjlist(offsets, indices, None)
from_cudf_edgelist(self, input_df, source='source', destination='destination', edge_attr=None, renumber=True)

Initialize a graph from the edge list. It is an error to call this method on an initialized Graph object. The passed input_df argument wraps gdf_column objects that represent a graph using the edge list format. source argument is source column name and destination argument is destination column name. Source and destination indices must be in the range [0, V) where V is the number of vertices. If renumbering needs to be done, renumber argument should be passed as True. If weights are present, edge_attr argument is the weights column name.

Parameters
input_dfcudf.DataFrame

This cudf.DataFrame wraps source, destination and weight gdf_column of size E (E: number of edges) The ‘src’ column contains the source index for each edge. Source indices are in the range [0, V) (V: number of vertices). The ‘dst’ column contains the destination index for each edge. Destination indices are in the range [0, V) (V: number of vertices). If renumbering needs to be done, renumber argument should be passed as True. For weighted graphs, dataframe contains ‘weight’ column containing the weight value for each edge.

sourcestr

source argument is source column name

destinationstr

destination argument is destination column name.

edge_attrstr

edge_attr argument is the weights column name.

renumberbool

If source and destination indices are not in range 0 to V where V is number of vertices, renumber argument should be True.

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr='2',
                         renumber=False)
get_two_hop_neighbors(self)

Compute vertex pairs that are two hops apart. The resulting pairs are sorted before returning.

Returns
dfcudf.DataFrame
df[‘first’]cudf.Series

the first vertex id of a pair.

df[‘second’]cudf.Series

the second vertex id of a pair.

has_edge(self, u, v)

Returns True if the graph contains the edge (u,v).

has_node(self, n)

Returns True if the graph contains the node n.

in_degree(self, vertex_subset=None)

Compute vertex in-degree. Vertex in-degree is the number of edges pointing into the vertex. By default, this method computes vertex degrees for the entire set of vertices. If vertex_subset is provided, this method optionally filters out all but those listed in vertex_subset.

Parameters
vertex_subsetcudf.Series or iterable container, optional

A container of vertices for displaying corresponding in-degree. If not set, degrees are computed for the entire set of vertices.

Returns
dfcudf.DataFrame

GPU data frame of size N (the default) or the size of the given vertices (vertex_subset) containing the in_degree. The ordering is relative to the adjacency list, or that given by the specified vertex_subset. df[‘vertex’] : cudf.Series

The vertex IDs (will be identical to vertex_subset if specified).

df[‘degree’]cudf.Series

The computed in-degree of the corresponding vertex.

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> sources = cudf.Series(M['0'])
>>> destinations = cudf.Series(M['1'])
>>> G = cugraph.Graph()
>>> G.add_edge_list(sources, destinations, None)
>>> df = G.in_degree([0,9,12])
is_directed(self)
neighbors(self, n)
nodes(self)

Returns all the nodes in the graph as a cudf.Series

number_of_edges(self)

Get the number of edges in the graph.

number_of_nodes(self)

An alias of number_of_vertices(). This function is added for NetworkX compatibility.

number_of_vertices(self)

Get the number of nodes in the graph.

out_degree(self, vertex_subset=None)

Compute vertex out-degree. Vertex out-degree is the number of edges pointing out from the vertex. By default, this method computes vertex degrees for the entire set of vertices. If vertex_subset is provided, this method optionally filters out all but those listed in vertex_subset.

Parameters
vertex_subsetcudf.Series or iterable container, optional

A container of vertices for displaying corresponding out-degree. If not set, degrees are computed for the entire set of vertices.

Returns
dfcudf.DataFrame

GPU data frame of size N (the default) or the size of the given vertices (vertex_subset) containing the out_degree. The ordering is relative to the adjacency list, or that given by the specified vertex_subset. df[‘vertex’] : cudf.Series

The vertex IDs (will be identical to vertex_subset if specified).

df[‘degree’]cudf.Series

The computed out-degree of the corresponding vertex.

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> sources = cudf.Series(M['0'])
>>> destinations = cudf.Series(M['1'])
>>> G = cugraph.Graph()
>>> G.add_edge_list(sources, destinations, None)
>>> df = G.out_degree([0,9,12])
to_directed(self)

Return a directed representation of the graph. This function sets the type of graph as DiGraph() and returns the directed view.

Returns
GDiGraph

A directed graph with the same nodes, and each edge (u,v,weights) replaced by two directed edges (u,v,weights) and (v,u,weights).

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, '0', '1')
>>> DiG = G.to_directed()
to_undirected(self)

Return an undirected copy of the graph.

Returns
GGraph

A undirected graph with the same nodes, and each directed edge (u,v,weights) replaced by an undirected edge (u,v,weights).

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> DiG = cugraph.DiGraph()
>>> DiG.from_cudf_edgelist(M, '0', '1')
>>> G = DiG.to_undirected()
class transposedAdjList(offsets, indices, value=None)
view_adj_list(self)

Display the adjacency list. Compute it if needed.

Returns
offset_colcudf.Series

This cudf.Series wraps a gdf_column of size V + 1 (V: number of vertices). The gdf column contains the offsets for the vertices in this graph. Offsets are in the range [0, E] (E: number of edges).

index_colcudf.Series

This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the destination index for each edge. Destination indices are in the range [0, V) (V: number of vertices).

value_colcudf.Series or None

This pointer is None for unweighted graphs. For weighted graphs, this cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the weight value for each edge. The expected type of the gdf_column element is floating point number.

view_edge_list(self)

Display the edge list. Compute it if needed.

NOTE: If the graph is of type Graph() then the displayed undirected edges are the same as displayed by networkx Graph(), but the direction could be different i.e. an edge displayed by cugraph as (src, dst) could be displayed as (dst, src) by networkx.

cugraph.Graph stores symmetrized edgelist internally. For displaying undirected edgelist for a Graph the upper trianglar matrix of the symmetrized edgelist is returned.

networkx.Graph renumbers the input and stores the upper triangle of this renumbered input. Since the internal renumbering of networx and cugraph is different, the upper triangular matrix of networkx renumbered input may not be the same as cugraph’s upper trianglar matrix of the symmetrized edgelist. Hence the displayed source and destination pairs in both will represent the same edge but node values could be swapped.

Returns
edgelist_dfcudf.DataFrame

This cudf.DataFrame wraps source, destination and weight gdf_column of size E (E: number of edges) The ‘src’ column contains the source index for each edge. Source indices are in the range [0, V) (V: number of vertices). The ‘dst’ column contains the destination index for each edge. Destination indices are in the range [0, V) (V: number of vertices). For weighted graphs, dataframe contains ‘weight’ column containing the weight value for each edge.

view_transposed_adj_list(self)

Display the transposed adjacency list. Compute it if needed.

Returns
offset_colcudf.Series

This cudf.Series wraps a gdf_column of size V + 1 (V: number of vertices). The gdf column contains the offsets for the vertices in this graph. Offsets are in the range [0, E] (E: number of edges).

index_colcudf.Series

This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the destination index for each edge. Destination indices are in the range [0, V) (V: number of vertices).

value_colcudf.Series or None

This pointer is None for unweighted graphs. For weighted graphs, this cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the weight value for each edge. The expected type of the gdf_column element is floating point number.

Renumbering

cugraph.structure.renumber.renumber(source_col, dest_col)

Take a (potentially sparse) set of source and destination vertex ids and renumber the vertices to create a dense set of vertex ids using all values contiguously from 0 to the number of unique vertices - 1.

Input columns can be either int64 or int32. The output will be mapped to int32, since many of the cugraph functions are limited to int32. If the number of unique values in source_col and dest_col > 2^31-1 then this function will return an error.

Return from this call will be three cudf Series - the renumbered source_col, the renumbered dest_col and a numbering map that maps the new ids to the original ids.

Parameters
source_colcudf.Series

This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the source index for each edge. Source indices must be an integer type.

dest_colcudf.Series

This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the destination index for each edge. Destination indices must be an integer type.

numbering_mapcudf.Series

This cudf.Series wraps a gdf column of size V (V: number of vertices). The gdf column contains a numbering map that maps the new ids to the original ids.

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> sources = cudf.Series(M['0'])
>>> destinations = cudf.Series(M['1'])
>>> source_col, dest_col, numbering_map = cugraph.renumber(sources,
>>>                                                        destinations)
>>> G = cugraph.Graph()
>>> G.add_edge_list(source_col, dest_col, None)
cugraph.structure.renumber.renumber_from_cudf(_df, source_cols_names, dest_cols_names)

Take a set, collection (lists) of source and destination columns, and renumber the vertices to create a dense set of contiguously vertex ids from 0 to the number of unique vertices - 1.

Input columns can be any data type.

The output will be mapped to int32, since many of the cugraph functions are limited to int32. If the number of unique values is > 2^31-1 then this function will return an error.

Returns
src_idscudf.Series

The new source vertex IDs

dst_idscudf.Series

The new destination vertex IDs

numbering_dfcudf.DataFrame

a dataframe that maps a vertex ID to the unique values

Examples

>>> gdf = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> source_col, dest_col, numbering_map =
>>>    cugraph.renumber_from_cudf(gdf, ["0"], ["1"])
>>>
>>> G = cugraph.Graph()
>>> G.add_edge_list(source_col, dest_col, None)

Symmetrize

cugraph.structure.symmetrize.symmetrize(source_col, dest_col, value_col=None)

Take a COO set of source destination pairs along with associated values and create a new COO set of source destination pairs along with values where all edges exist in both directions.

Return from this call will be a COO stored as two cudf Series - the symmetrized source column and the symmetrized dest column, along with an optional cudf Series containing the associated values (only if the values are passed in).

Parameters
source_colcudf.Series

This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the source index for each edge. Source indices must be an integer type.

dest_colcudf.Series

This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the destination index for each edge. Destination indices must be an integer type.

value_colcudf.Series (optional)

This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains values associated with this edge. For this function the values can be any type, they are not examined, just copied.

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> sources = cudf.Series(M['0'])
>>> destinations = cudf.Series(M['1'])
>>> values = cudf.Series(M['2'])
>>> src, dst, val = cugraph.symmetrize(sources, destinations, values)
>>> G = cugraph.Graph()
>>> G.add_edge_list(src, dst, val)
cugraph.structure.symmetrize.symmetrize_df(df, src_name, dst_name)

Take a COO stored in a DataFrame, along with the column names of the source and destination columns and create a new data frame using the same column names that symmetrize the graph so that all edges appear in both directions.

Note that if other columns exist in the data frame (e.g. edge weights) the other columns will also be replicated. That is, if (u,v,data) represents the source value (u), destination value (v) and some set of other columns (data) in the input data, then the output data will contain both (u,v,data) and (v,u,data) with matching data.

If (u,v,data1) and (v,u,data2) exist in the input data where data1 != data2 then this code will arbitrarily pick the smaller data element to keep, if this is not desired then the caller should should correct the data prior to calling symmetrize.

Parameters
dfcudf.DataFrame

Input data frame containing COO. Columns should contain source ids, destination ids and any properties associated with the edges.

src_namestring

Name of the column in the data frame containing the source ids

dst_namestring

Name of the column in the data frame containing the destination ids

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> sym_df = cugraph.symmetrize(M, '0', '1')
>>> G = cugraph.Graph()
>>> G.add_edge_list(sym_df['0]', sym_df['1'], sym_df['2'])

Conversion from Other Formats

cugraph.structure.convert_matrix.from_cudf_edgelist(df, source='source', destination='destination', edge_attr=None, create_using=<class 'cugraph.structure.graph.Graph'>, renumber=True)

Return a new graph created from the edge list representaion. This function is added for NetworkX compatibility (this function is a RAPIDS version of NetworkX’s from_pandas_edge_list()).

Parameters
dfcudf.DataFrame

This cudf.DataFrame contains columns storing edge source vertices, destination (or target following NetworkX’s terminology) vertices, and (optional) weights.

sourcestring or integer

This is used to index the source column.

targetstring or integer

This is used to index the destination (or target following NetworkX’s terminology) column.

weightstring or integer, optional

This pointer can be None. If not, this is used to index the weight column.

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G = cugraph.from_cudf_edgelist(M, source='0', target='1', weight='2')

Centrality

Betweenness Centrality

cugraph.centrality.betweenness_centrality.betweenness_centrality(G, k=None, normalized=True, weight=None, endpoints=False, seed=None, result_dtype=<class 'numpy.float64'>)

Compute the betweenness centrality for all nodes of the graph G from a sample of ‘k’ sources. CuGraph does not currently support the ‘endpoints’ and ‘weight’ parameters as seen in the corresponding networkX call.

Parameters
GcuGraph.Graph

cuGraph graph descriptor with connectivity information. The graph can be either directed (DiGraph) or undirected (Graph). Weights in the graph are ignored, the current implementation uses BFS traversals. Use weight parameter if weights need to be considered (currently not supported)

kint or list or None, optional, default=None

If k is not None, use k node samples to estimate betweenness. Higher values give better approximation If k is a list, use the content of the list for estimation: the list should contain vertices identifiers. Vertices obtained through sampling or defined as a list will be used as sources for traversals inside the algorithm.

normalizedbool, optional

Default is True. If true, the betweenness values are normalized by 2 / ((n - 1) * (n - 2)) for Graphs (undirected), and 1 / ((n - 1) * (n - 2)) for DiGraphs (directed graphs) where n is the number of nodes in G. Normalization will ensure that the values in [0, 1], this normalization scales fo the highest possible value where one node is crossed by every single shortest path.

weightcudf.DataFrame, optional, default=None

Specifies the weights to be used for each edge. Should contain a mapping between edges and weights. (Not Supported)

endpointsbool, optional, default=False

If true, include the endpoints in the shortest path counts. (Not Supported)

seedoptional

if k is specified and k is an integer, use seed to initialize the random number generator. Using None as seed relies on random.seed() behavior: using current system time If k is either None or list: seed parameter is ignored

result_dtypenp.float32 or np.float64, optional, default=np.float64

Indicate the data type of the betweenness centrality scores

Returns
dfcudf.DataFrame

GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding betweenness centrality values. Please note that the resulting the ‘vertex’ column might not be in ascending order.

df[‘vertex’]cudf.Series

Contains the vertex identifiers

df[‘betweenness_centrality’]cudf.Series

Contains the betweenness centrality of vertices

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1')
>>> bc = cugraph.betweenness_centrality(G)

Katz Centrality

cugraph.centrality.katz_centrality.katz_centrality(G, alpha=None, max_iter=100, tol=1e-06, nstart=None, normalized=True)

Compute the Katz centrality for the nodes of the graph G. cuGraph does not currently support the ‘beta’ and ‘weight’ parameters as seen in the corresponding networkX call. This implementation is based on a relaxed version of Katz defined by Foster with a reduced computational complexity of O(n+m)

Foster, K.C., Muth, S.Q., Potterat, J.J. et al. Computational & Mathematical Organization Theory (2001) 7: 275. https://doi.org/10.1023/A:1013470632383

Parameters
GcuGraph.Graph

cuGraph graph descriptor with connectivity information. The graph can contain either directed (DiGraph) or undirected edges (Graph).

alphafloat

Attenuation factor defaulted to None. If alpha is not specified then it is internally calculated as 1/(degree_max) where degree_max is the maximum out degree. NOTE : The maximum acceptable value of alpha for convergence alpha_max = 1/(lambda_max) where lambda_max is the largest eigenvalue of the graph. Since lambda_max is always lesser than or equal to degree_max for a graph, alpha_max will always be greater than or equal to (1/degree_max). Therefore, setting alpha to (1/degree_max) will guarantee that it will never exceed alpha_max thus in turn fulfilling the requirement for convergence.

max_iterint

The maximum number of iterations before an answer is returned. This can be used to limit the execution time and do an early exit before the solver reaches the convergence tolerance. If this value is lower or equal to 0 cuGraph will use the default value, which is 100.

tolerancefloat

Set the tolerance the approximation, this parameter should be a small magnitude value. The lower the tolerance the better the approximation. If this value is 0.0f, cuGraph will use the default value which is 1.0e-6. Setting too small a tolerance can lead to non-convergence due to numerical roundoff. Usually values between 1e-2 and 1e-6 are acceptable.

nstartcudf.Dataframe

GPU Dataframe containing the initial guess for katz centrality.

nstart[‘vertex’]cudf.Series

Contains the vertex identifiers

nstart[‘values’]cudf.Series

Contains the katz centrality values of vertices

normalizedbool

If True normalize the resulting katz centrality values

Returns
dfcudf.DataFrame

GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding katz centrality values.

df[‘vertex’]cudf.Series

Contains the vertex identifiers

df[‘katz_centrality’]cudf.Series

Contains the katz centrality of vertices

Examples

>>> gdf = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(gdf, source='0', destination='1')
>>> kc = cugraph.katz_centrality(G)

Community

Louvain

cugraph.community.louvain.louvain(input_graph, max_iter=100)

Compute the modularity optimizing partition of the input graph using the Louvain heuristic

Parameters
input_graphcugraph.Graph

cuGraph graph descriptor of type Graph

The adjacency list will be computed if not already present. The graph should be undirected where an undirected edge is represented by a directed edge in both direction.

max_iterinteger

This controls the maximum number of levels/iterations of the Louvain algorithm. When specified the algorithm will terminate after no more than the specified number of iterations. No error occurs when the algorithm terminates early in this manner.

Returns
partscudf.DataFrame

GPU data frame of size V containing two columns the vertex id and the partition id it is assigned to.

df[‘vertex’]cudf.Series

Contains the vertex identifiers

df[‘partition’]cudf.Series

Contains the partition assigned to the vertices

modularity_scorefloat

a floating point number containing the global modularity score of the partitioning.

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1')
>>> parts, modularity_score = cugraph.louvain(G)

K-Truss

cugraph.community.ktruss_subgraph.ktruss_subgraph(G, k, use_weights=True)

Returns the K-Truss subgraph of a graph for a specific k.

The k-truss of a graph is a subgraph where each edge is part of at least (k−2) triangles. K-trusses are used for finding tighlty knit groups of vertices in a graph. A k-truss is a relaxation of a k-clique in the graph and was define in [1]. Finding cliques is computationally demanding and finding the maximal k-clique is known to be NP-Hard.

In contrast, finding a k-truss is computationally tractable as its key building block, namely triangle counting counting, can be executed in polnymomial time.Typically, it takes many iterations of triangle counting to find the k-truss of a graph. Yet these iterations operate on a weakly monotonically shrinking graph. Therefore, finding the k-truss of a graph can be done in a fairly reasonable amount of time. The solution in cuGraph is based on a GPU algorithm first shown in [2] and uses the triangle counting algorithm from [3].

[1] Cohen, J., “Trusses: Cohesive subgraphs for social network analysis” National security agency technical report, 2008

[2] O. Green, J. Fox, E. Kim, F. Busato, et al. “Quickly Finding a Truss in a Haystack” IEEE High Performance Extreme Computing Conference (HPEC), 2017 https://doi.org/10.1109/HPEC.2017.8091038

[3] O. Green, P. Yalamanchili, L.M. Munguia, “Fast Triangle Counting on GPU” Irregular Applications: Architectures and Algorithms (IA3), 2014

Parameters
GcuGraph.Graph

cuGraph graph descriptor with connectivity information. k-Trusses are defined for only undirected graphs as they are defined for undirected triangle in a graph.

kint

The desired k to be used for extracting the k-truss subgraph.

use_weightsBool

whether the output should contain the edge weights if G has them

Returns
G_trusscuGraph.Graph

A cugraph graph descriptor with the k-truss subgraph for the given k.

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1')
>>> k_subgraph = cugraph.ktruss_subgraph(G, 3)

ECG

cugraph.community.ecg.ecg(input_graph, min_weight=0.05, ensemble_size=16)

Compute the Ensemble Clustering for Graphs (ECG) partition of the input graph. ECG runs truncated Louvain on an ensemble of permutations of the input graph, then uses the ensemble partitions to determine weights for the input graph. The final result is found by running full Louvain on the input graph using the determined weights.

See https://arxiv.org/abs/1809.05578 for further information.

Parameters
input_graphcugraph.Graph

cuGraph graph descriptor, should contain the connectivity information and weights. The adjacency list will be computed if not already present.

min_weightfloating point

The minimum value to assign as an edgeweight in the ECG algorithm. It should be a value in the range [0,1] usually left as the default value of .05

ensemble_sizeinteger

The number of graph permutations to use for the ensemble. The default value is 16, larger values may produce higher quality partitions for some graphs.

Returns
partscudf.DataFrame

GPU data frame of size V containing two columns, the vertex id and the partition id it is assigned to.

df[‘vertex’]cudf.Series

Contains the vertex identifiers

df[‘partition’]cudf.Series

Contains the partition assigned to the vertices

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr='2')
>>> parts = cugraph.ecg(G)

Spectral Clustering

cugraph.community.spectral_clustering.analyzeClustering_edge_cut(G, n_clusters, clustering)

Compute the edge cut score for a partitioning/clustering

Parameters
Gcugraph.Graph

cuGraph graph descriptor

n_clustersinteger

Specifies the number of clusters in the given clustering

clusteringcudf.Series

The cluster assignment to analyze.

Returns
scorefloat

The computed edge cut score

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr=None)
>>> df = cugraph.spectralBalancedCutClustering(G, 5)
>>> score = cugraph.analyzeClustering_edge_cut(G, 5, df['cluster'])
cugraph.community.spectral_clustering.analyzeClustering_modularity(G, n_clusters, clustering)

Compute the modularity score for a partitioning/clustering

Parameters
Gcugraph.Graph

cuGraph graph descriptor. This graph should have edge weights.

n_clustersinteger

Specifies the number of clusters in the given clustering

clusteringcudf.Series

The cluster assignment to analyze.

Returns
scorefloat

The computed modularity score

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr='2')
>>> df = cugraph.spectralBalancedCutClustering(G, 5)
>>> score = cugraph.analyzeClustering_modularity(G, 5, df['cluster'])
cugraph.community.spectral_clustering.analyzeClustering_ratio_cut(G, n_clusters, clustering)

Compute the ratio cut score for a partitioning/clustering

Parameters
Gcugraph.Graph

cuGraph graph descriptor. This graph should have edge weights.

n_clustersinteger

Specifies the number of clusters in the given clustering

clusteringcudf.Series

The cluster assignment to analyze.

Returns
scorefloat

The computed ratio cut score

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr='2')
>>> df = cugraph.spectralBalancedCutClustering(G, 5)
>>> score = cugraph.analyzeClustering_ratio_cut(G, 5, df['cluster'])
cugraph.community.spectral_clustering.spectralBalancedCutClustering(G, num_clusters, num_eigen_vects=2, evs_tolerance=1e-05, evs_max_iter=100, kmean_tolerance=1e-05, kmean_max_iter=100)

Compute a clustering/partitioning of the given graph using the spectral balanced cut method.

Parameters
Gcugraph.Graph

cuGraph graph descriptor

num_clustersinteger

Specifies the number of clusters to find

num_eigen_vectsinteger

Specifies the number of eigenvectors to use. Must be lower or equal to num_clusters.

evs_tolerance: float

Specifies the tolerance to use in the eigensolver Default is 0.00001

evs_max_iter: integer

Specifies the maximum number of iterations for the eigensolver Default is 100

kmean_tolerance: float

Specifies the tolerance to use in the k-means solver Default is 0.00001

kmean_max_iter: integer

Specifies the maximum number of iterations for the k-means solver Default is 100

Returns
dfcudf.DataFrame

GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding cluster assignments.

df[‘vertex’]cudf.Series

contains the vertex identifiers

df[‘cluster’]cudf.Series

contains the cluster assignments

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1')
>>> df = cugraph.spectralBalancedCutClustering(G, 5)
cugraph.community.spectral_clustering.spectralModularityMaximizationClustering(G, num_clusters, num_eigen_vects=2, evs_tolerance=1e-05, evs_max_iter=100, kmean_tolerance=1e-05, kmean_max_iter=100)

Compute a clustering/partitioning of the given graph using the spectral modularity maximization method.

Parameters
Gcugraph.Graph

cuGraph graph descriptor. This graph should have edge weights.

num_clustersinteger

Specifies the number of clusters to find

num_eigen_vectsinteger

Specifies the number of eigenvectors to use. Must be lower or equal to num_clusters

evs_tolerance: float

Specifies the tolerance to use in the eigensolver Default is 0.00001

evs_max_iter: integer

Specifies the maximum number of iterations for the eigensolver Default is 100

kmean_tolerance: float

Specifies the tolerance to use in the k-means solver Default is 0.00001

kmean_max_iter: integer

Specifies the maximum number of iterations for the k-means solver Default is 100

Returns
dfcudf.DataFrame
df[‘vertex’]cudf.Series

contains the vertex identifiers

df[‘cluster’]cudf.Series

contains the cluster assignments

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr='2')
>>> df = cugraph.spectralModularityMaximizationClustering(G, 5)

Subgraph Extraction

cugraph.community.subgraph_extraction.subgraph(G, vertices)

Compute a subgraph of the existing graph including only the specified vertices. This algorithm works for both directed and undirected graphs, it does not actually traverse the edges, simply pulls out any edges that are incident on vertices that are both contained in the vertices list.

Parameters
Gcugraph.Graph

cuGraph graph descriptor

verticescudf.Series

Specifies the vertices of the induced subgraph

Returns
Sgcugraph.Graph

A graph object containing the subgraph induced by the given vertex set.

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1')
>>> verts = numpy.zeros(3, dtype=numpy.int32)
>>> verts[0] = 0
>>> verts[1] = 1
>>> verts[2] = 2
>>> sverts = cudf.Series(verts)
>>> Sg = cugraph.subgraph(G, sverts)

Triangle Counting

cugraph.community.triangle_count.triangles(G)

Compute the triangle (number of cycles of length three) count of the input graph.

Parameters
Gcugraph.graph

cuGraph graph descriptor, should contain the connectivity information, (edge weights are not used in this algorithm)

Returns
countint64

A 64 bit integer whose value gives the number of triangles in the graph.

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1')
>>> count = cugraph.triangles(G)

Components

Connected Components

cugraph.components.connectivity.strongly_connected_components(G)

Generate the Stronlgly Connected Components and attach a component label to each vertex.

Parameters
Gcugraph.Graph

cuGraph graph descriptor, should contain the connectivity information as an edge list (edge weights are not used for this algorithm). The graph can be either directed or undirected where an undirected edge is represented by a directed edge in both directions. The adjacency list will be computed if not already present. The number of vertices should fit into a 32b int.

Returns
dfcudf.DataFrame

df[‘labels’][i] gives the label id of the i’th vertex df[‘vertices’][i] gives the vertex id of the i’th vertex

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr=None)
>>> df = cugraph.strongly_connected_components(G)
cugraph.components.connectivity.weakly_connected_components(G)

Generate the Weakly Connected Components and attach a component label to each vertex.

Parameters
Gcugraph.Graph

cuGraph graph descriptor, should contain the connectivity information as an edge list (edge weights are not used for this algorithm). Currently, the graph should be undirected where an undirected edge is represented by a directed edge in both directions. The adjacency list will be computed if not already present. The number of vertices should fit into a 32b int.

Returns
dfcudf.DataFrame

df[‘labels’][i] gives the label id of the i’th vertex df[‘vertices’][i] gives the vertex id of the i’th vertex

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr=None)
>>> df = cugraph.weakly_connected_components(G)

Cores

K-Core

cugraph.cores.k_core.k_core(G, k=None, core_number=None)

Compute the k-core of the graph G based on the out degree of its nodes. A k-core of a graph is a maximal subgraph that contains nodes of degree k or more. This call does not support a graph with self-loops and parallel edges.

Parameters
GcuGraph.Graph

cuGraph graph descriptor with connectivity information. The graph should contain undirected edges where undirected edges are represented as directed edges in both directions. While this graph can contain edge weights, they don’t participate in the calculation of the k-core.

kint, optional

Order of the core. This value must not be negative. If set to None, the main core is returned.

core_numbercudf.DataFrame, optional

Precomputed core number of the nodes of the graph G containing two cudf.Series of size V: the vertex identifiers and the corresponding core number values. If set to None, the core numbers of the nodes are calculated internally.

core_number[‘vertex’]cudf.Series

Contains the vertex identifiers

core_number[‘values’]cudf.Series

Contains the core number of vertices

Returns
KCoreGraphcuGraph.Graph

K Core of the input graph

Examples

>>> gdf = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(gdf, source='0', destination='1')
>>> KCoreGraph = cugraph.k_core(G)

Core Number

cugraph.cores.core_number.core_number(G)

Compute the core numbers for the nodes of the graph G. A k-core of a graph is a maximal subgraph that contains nodes of degree k or more. A node has a core number of k if it belongs a k-core but not to k+1-core. This call does not support a graph with self-loops and parallel edges.

Parameters
graphcuGraph.Graph

cuGraph graph descriptor with connectivity information. The graph should contain undirected edges where undirected edges are represented as directed edges in both directions. While this graph can contain edge weights, they don’t participate in the calculation of the core numbers.

Returns
dfcudf.DataFrame

GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding core number values.

df[‘vertex’]cudf.Series

Contains the vertex identifiers

df[‘core_number’]cudf.Series

Contains the core number of vertices

Examples

>>> gdf = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(gdf, source='0', destination='1')
>>> cn = cugraph.core_number(G)

Layout

Force Atlas 2

cugraph.layout.force_atlas2.force_atlas2(input_graph, max_iter=500, pos_list=None, outbound_attraction_distribution=True, lin_log_mode=False, prevent_overlapping=False, edge_weight_influence=1.0, jitter_tolerance=1.0, barnes_hut_optimize=True, barnes_hut_theta=0.5, scaling_ratio=2.0, strong_gravity_mode=False, gravity=1.0, verbose=False, callback=None)

ForceAtlas2 is a continuous graph layout algorithm for handy network visualization.

NOTE: Peak memory allocation occurs at 17*V.

Parameters
input_graphcugraph.Graph

cuGraph graph descriptor with connectivity information. Edge weights, if present, should be single or double precision floating point values.

max_iterinteger

This controls the maximum number of levels/iterations of the Force Atlas algorithm. When specified the algorithm will terminate after no more than the specified number of iterations. No error occurs when the algorithm terminates early in this manner. Good short-term quality can be achieved with 50-100 iterations. Above 1000 iterations is discouraged.

pos_list: cudf.DataFrame

Data frame with initial vertex positions containing two columns: ‘x’ and ‘y’ positions.

outbound_attraction_distribution: bool

Distributes attraction along outbound edges. Hubs attract less and thus are pushed to the borders.

lin_log_mode: bool

Switch Force Atlas model from lin-lin to lin-log. Makes clusters more tight.

prevent_overlapping: bool

Prevent nodes to overlap.

edge_weight_influence: float

How much influence you give to the edges weight. 0 is “no influence” and 1 is “normal”.

jitter_tolerance: float

How much swinging you allow. Above 1 discouraged. Lower gives less speed and more precision.

barnes_hut_theta: float

Float between 0 and 1. Tradeoff for speed (1) vs accuracy (0) for Barnes Hut only.

scaling_ratio: float

How much repulsion you want. More makes a more sparse graph. Switching from regular mode to LinLog mode needs a readjustment of the scaling parameter.

gravityfloat

Attracts nodes to the center. Prevents islands from drifting away.

verbose: bool

Output convergence info at each interation.

callback: GraphBasedDimRedCallback

An instance of GraphBasedDimRedCallback class to intercept the internal state of positions while they are being trained. Example of callback usage:

from cugraph.layout import GraphBasedDimRedCallback
class CustomCallback(GraphBasedDimRedCallback):
def on_preprocess_end(self, positions):

print(positions.copy_to_host())

def on_train_end(self, positions):

print(positions.copy_to_host())

def on_train_end(self, positions):

print(positions.copy_to_host())

Returns
poscudf.DataFrame

GPU data frame of size V containing three columns: the vertex identifiers and the x and y positions.

Traversal

Breadth-first-search

cugraph.traversal.bfs.bfs(G, start, return_sp_counter=False)

Find the distances and predecessors for a breadth first traversal of a graph.

Parameters
Gcugraph.graph

cuGraph graph descriptor, should contain the connectivity information as an adjacency list.

startInteger

The index of the graph vertex from which the traversal begins

return_sp_counterbool, optional, default=False

Indicates if shortest path counters should be returned

Returns
dfcudf.DataFrame

df[‘vertex’][i] gives the vertex id of the i’th vertex

df[‘distance’][i] gives the path distance for the i’th vertex from the starting vertex

df[‘predecessor’][i] gives for the i’th vertex the vertex it was reached from in the traversal

df[‘sp_counter’][i] gives for the i’th vertex the number of shortest path leading to it during traversal (Only if retrun_sp_counter is True)

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1')
>>> df = cugraph.bfs(G, 0)

Single-source-shortest-path

cugraph.traversal.sssp.filter_unreachable(df)

Remove unreachable vertices from the result of SSSP or BFS

Parameters
dfcudf.DataFrame

cudf.DataFrame that is the output of SSSP or BFS

Returns
dffiltered cudf.DataFrame with only reachable vertices

df[‘vertex’][i] gives the vertex id of the i’th vertex. df[‘distance’][i] gives the path distance for the i’th vertex from the starting vertex. df[‘predecessor’][i] gives the vertex that was reached before the i’th vertex in the traversal.

cugraph.traversal.sssp.sssp(G, source)

Compute the distance and predecessors for shortest paths from the specified source to all the vertices in the graph. The distances column will store the distance from the source to each vertex. The predecessors column will store each vertex’s predecessor in the shortest path. Vertices that are unreachable will have a distance of infinity denoted by the maximum value of the data type and the predecessor set as -1. The source vertex’s predecessor is also set to -1. Graphs with negative weight cycles are not supported.

Parameters
graphcuGraph.Graph

cuGraph graph descriptor with connectivity information. Edge weights, if present, should be single or double precision floating point values.

sourceint

Index of the source vertex.

Returns
dfcudf.DataFrame

df[‘vertex’][i] gives the vertex id of the i’th vertex. df[‘distance’][i] gives the path distance for the i’th vertex from the starting vertex. df[‘predecessor’][i] gives the vertex id of the vertex that was reached before the i’th vertex in the traversal.

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1')
>>> distances = cugraph.sssp(G, 0)

Utilities

R-mat Graph Generation