cuGraph API Reference

Structure

Graph

class cugraph.structure.graph_classes.Graph(m_graph=None, directed=False)[source]

Methods

add_internal_vertex_id(df, …[, drop, …])

Given a DataFrame containing external vertex ids in the identified columns, return a DataFrame containing the internal vertex ids as the specified column name.

add_nodes_from(nodes)

Add nodes information to the Graph.

clear()

Empty the graph.

from_cudf_adjlist(offset_col, index_col[, …])

Initialize a graph from the adjacency list.

from_cudf_edgelist(input_df[, source, …])

Initialize a graph from the edge list.

from_dask_cudf_edgelist(input_ddf[, source, …])

Initializes the distributed graph from the dask_cudf.DataFrame edgelist.

from_numpy_array(np_array[, nodes])

Initializes the graph from numpy array containing adjacency matrix.

from_numpy_matrix(np_matrix)

Initializes the graph from numpy matrix containing adjacency matrix.

from_pandas_adjacency(pdf)

Initializes the graph from pandas adjacency matrix

from_pandas_edgelist(pdf[, source, …])

Initialize a graph from the edge list.

has_isolated_vertices()

Returns True if the graph has isolated vertices.

is_bipartite()

Checks if Graph is bipartite.

is_directed()

Returns True if the graph is a directed graph.

is_multigraph()

Returns True if the graph is a multigraph.

is_multipartite()

Checks if Graph is multipartite.

is_renumbered()

Returns True if the graph is renumbered.

is_weighted()

Returns True if the graph has edge weights.

lookup_internal_vertex_id(df[, column_name])

Given a DataFrame containing external vertex ids in the identified columns, or a Series containing external vertex ids, return a Series with the internal vertex ids.

to_directed()

Return a directed representation of the graph.

to_undirected()

Return an undirected copy of the graph.

unrenumber(df, column_name[, …])

Given a DataFrame containing internal vertex ids in the identified column, replace this with external vertex ids.

Properties

class Properties(directed)[source]
add_internal_vertex_id(df, internal_column_name, external_column_name, drop=True, preserve_order=False)[source]

Given a DataFrame containing external vertex ids in the identified columns, return a DataFrame containing the internal vertex ids as the specified column name. Optionally drop the external vertex id columns. Optionally preserve the order of the original DataFrame. Parameters ———- df: cudf.DataFrame or dask_cudf.DataFrame A DataFrame containing external vertex identifiers that will be converted into internal vertex identifiers.

internal_column_name: string Name of column to contain the internal vertex id

external_column_name: string or list of strings Name of the column(s) containing the external vertex ids

drop: (optional) bool, defaults to True Drop the external columns from the returned DataFrame

preserve_order: (optional) bool, defaults to False Preserve the order of the data frame (requires an extra sort)

Returns
dfcudf.DataFrame or dask_cudf.DataFrame

Original DataFrame with new column containing internal vertex id

add_nodes_from(nodes)[source]

Add nodes information to the Graph. Parameters ———- nodes : list or cudf.Series The nodes of the graph to be stored.

clear()[source]

Empty the graph.

from_cudf_adjlist(offset_col, index_col, value_col=None)[source]

Initialize a graph from the adjacency list. It is an error to call this method on an initialized Graph object. The passed offset_col and index_col arguments wrap gdf_column objects that represent a graph using the adjacency list format. If value_col is None, an unweighted graph is created. If value_col is not None, a weighted graph is created. Undirected edges must be stored as directed edges in both directions. Parameters ———- offset_col : cudf.Series This cudf.Series wraps a gdf_column of size V + 1 (V: number of vertices). The gdf column contains the offsets for the vertices in this graph. Offsets must be in the range [0, E] (E: number of edges).

index_col : cudf.Series This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the destination index for each edge. Destination indices must be in the range [0, V) (V: number of vertices).

value_col : cudf.Series, optional This pointer can be None. If not, this cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the weight value for each edge. The expected type of the gdf_column element is floating point number.

Examples

>>> gdf = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> M = gdf.to_pandas()
>>> M = scipy.sparse.coo_matrix((M['2'],(M['0'],M['1'])))
>>> M = M.tocsr()
>>> offsets = cudf.Series(M.indptr)
>>> indices = cudf.Series(M.indices)
>>> G = cugraph.Graph()
>>> G.from_cudf_adjlist(offsets, indices, None)
from_cudf_edgelist(input_df, source='source', destination='destination', edge_attr=None, renumber=True)[source]

Initialize a graph from the edge list. It is an error to call this method on an initialized Graph object. The passed input_df argument wraps gdf_column objects that represent a graph using the edge list format. source argument is source column name and destination argument is destination column name. By default, renumbering is enabled to map the source and destination vertices into an index in the range [0, V) where V is the number of vertices. If the input vertices are a single column of integers in the range [0, V), renumbering can be disabled and the original external vertex ids will be used. If weights are present, edge_attr argument is the weights column name.

Parameters
input_dfcudf.DataFrame or dask_cudf.DataFrame
A DataFrame that contains edge information If a dask_cudf.DataFrame is
passed it will be reinterpreted as a cudf.DataFrame. For the
distributed path please use from_dask_cudf_edgelist.
sourcestr or array-like
source column name or array of column names
destinationstr or array-like
destination column name or array of column names
edge_attrstr or None
the weights column name. Default is None
renumberbool
Indicate whether or not to renumber the source and destination vertex
IDs. Default is True.

Examples

>>> df = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(df, source='0', destination='1',
                         edge_attr='2', renumber=False)
from_dask_cudf_edgelist(input_ddf, source='source', destination='destination', edge_attr=None, renumber=True)[source]

Initializes the distributed graph from the dask_cudf.DataFrame edgelist. Undirected Graphs are not currently supported. By default, renumbering is enabled to map the source and destination vertices into an index in the range [0, V) where V is the number of vertices. If the input vertices are a single column of integers in the range [0, V), renumbering can be disabled and the original external vertex ids will be used. Note that the graph object will store a reference to the dask_cudf.DataFrame provided.

Parameters
input_ddfdask_cudf.DataFrame
The edgelist as a dask_cudf.DataFrame
sourcestr or array-like
source column name or array of column names
destinationstr
destination column name or array of column names
edge_attrstr
weights column name.
renumberbool
If source and destination indices are not in range 0 to V where V is
number of vertices, renumber argument should be True.
from_numpy_array(np_array, nodes=None)[source]

Initializes the graph from numpy array containing adjacency matrix.

from_numpy_matrix(np_matrix)[source]

Initializes the graph from numpy matrix containing adjacency matrix.

from_pandas_adjacency(pdf)[source]

Initializes the graph from pandas adjacency matrix

from_pandas_edgelist(pdf, source='source', destination='destination', edge_attr=None, renumber=True)[source]

Initialize a graph from the edge list. It is an error to call this method on an initialized Graph object. Source argument is source column name and destination argument is destination column name. By default, renumbering is enabled to map the source and destination vertices into an index in the range [0, V) where V is the number of vertices. If the input vertices are a single column of integers in the range [0, V), renumbering can be disabled and the original external vertex ids will be used. If weights are present, edge_attr argument is the weights column name.

Parameters
input_dfpandas.DataFrame
A DataFrame that contains edge information
sourcestr or array-like
source column name or array of column names
destinationstr or array-like
destination column name or array of column names
edge_attrstr or None
the weights column name. Default is None
renumberbool
Indicate whether or not to renumber the source and destination vertex
IDs. Default is True.

Examples

>>> df = pandas.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_pandas_edgelist(df, source='0', destination='1',
                         edge_attr='2', renumber=False)
has_isolated_vertices()[source]

Returns True if the graph has isolated vertices.

is_bipartite()[source]

Checks if Graph is bipartite. This solely relies on the user call of add_nodes_from with the bipartite parameter. This does not parse the graph to check if it is bipartite.

is_directed()[source]

Returns True if the graph is a directed graph. Returns False if the graph is an undirected graph.

is_multigraph()[source]

Returns True if the graph is a multigraph. Else returns False.

is_multipartite()[source]

Checks if Graph is multipartite. This solely relies on the user call of add_nodes_from with the partition parameter. This does not parse the graph to check if it is multipartite.

is_renumbered()[source]

Returns True if the graph is renumbered.

is_weighted()[source]

Returns True if the graph has edge weights.

lookup_internal_vertex_id(df, column_name=None)[source]

Given a DataFrame containing external vertex ids in the identified columns, or a Series containing external vertex ids, return a Series with the internal vertex ids. Note that this function does not guarantee order in single GPU mode, and does not guarantee order or partitioning in multi-GPU mode.

Parameters
df: cudf.DataFrame, cudf.Series, dask_cudf.DataFrame, dask_cudf.Series
A DataFrame containing external vertex identifiers that will be
converted into internal vertex identifiers.
column_name: (optional) string
Name of the column containing the external vertex ids
Returns
seriescudf.Series or dask_cudf.Series

The internal vertex identifiers

to_directed()[source]

Return a directed representation of the graph. This function sets the type of graph as DiGraph() and returns the directed view.

Returns
GDiGraph
A directed graph with the same nodes, and each edge (u,v,weights)
replaced by two directed edges (u,v,weights) and (v,u,weights).

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, '0', '1')
>>> DiG = G.to_directed()
to_undirected()[source]

Return an undirected copy of the graph.

Returns
GGraph
A undirected graph with the same nodes, and each directed edge
(u,v,weights) replaced by an undirected edge (u,v,weights).

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> DiG = cugraph.DiGraph()
>>> DiG.from_cudf_edgelist(M, '0', '1')
>>> G = DiG.to_undirected()
unrenumber(df, column_name, preserve_order=False, get_column_names=False)[source]

Given a DataFrame containing internal vertex ids in the identified column, replace this with external vertex ids. If the renumbering is from a single column, the output dataframe will use the same name for the external vertex identifiers. If the renumbering is from a multi-column input, the output columns will be labeled 0 through n-1 with a suffix of _column_name. Note that this function does not guarantee order in single GPU mode, and does not guarantee order or partitioning in multi-GPU mode. If you wish to preserve ordering, add an index column to df and sort the return by that index column.

Parameters
df: cudf.DataFrame or dask_cudf.DataFrame
A DataFrame containing internal vertex identifiers that will be
converted into external vertex identifiers.
column_name: string
Name of the column containing the internal vertex id.
preserve_order: (optional) bool
If True, preserve the order of the rows in the output DataFrame to
match the input DataFrame
Returns
dfcudf.DataFrame or dask_cudf.DataFrame
The original DataFrame columns exist unmodified. The external vertex
identifiers are added to the DataFrame, the internal vertex identifier
column is removed from the dataframe.

Symmetrize

cugraph.structure.symmetrize.symmetrize(source_col, dest_col, value_col=None, multi=False, symmetrize=True)[source]

Take a COO set of source destination pairs along with associated values stored in a single GPU or distributed create a new COO set of source destination pairs along with values where all edges exist in both directions.

Return from this call will be a COO stored as two cudf Series or dask_cudf.Series -the symmetrized source column and the symmetrized dest column, along with an optional cudf Series containing the associated values (only if the values are passed in).

Parameters
source_colcudf.Series or dask_cudf.Series

This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the source index for each edge. Source indices must be an integer type.

dest_colcudf.Series or dask_cudf.Series

This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the destination index for each edge. Destination indices must be an integer type.

value_colcudf.Series or dask_cudf.Series (optional)

This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains values associated with this edge. For this function the values can be any type, they are not examined, just copied.

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> sources = cudf.Series(M['0'])
>>> destinations = cudf.Series(M['1'])
>>> values = cudf.Series(M['2'])
>>> src, dst, val = cugraph.symmetrize(sources, destinations, values)
cugraph.structure.symmetrize.symmetrize_ddf(df, src_name, dst_name, weight_name=None)[source]

Take a COO stored in a distributed DataFrame, and the column names of the source and destination columns and create a new data frame using the same column names that symmetrize the graph so that all edges appear in both directions.

Note that if other columns exist in the data frame (e.g. edge weights) the other columns will also be replicated. That is, if (u,v,data) represents the source value (u), destination value (v) and some set of other columns (data) in the input data, then the output data will contain both (u,v,data) and (v,u,data) with matching data.

If (u,v,data1) and (v,u,data2) exist in the input data where data1 != data2 then this code will arbitrarily pick the smaller data element to keep, if this is not desired then the caller should should correct the data prior to calling symmetrize.

Parameters
dfdask_cudf.DataFrame

Input data frame containing COO. Columns should contain source ids, destination ids and any properties associated with the edges.

src_namestring

Name of the column in the data frame containing the source ids

dst_namestring

Name of the column in the data frame containing the destination ids

multibool

Set to True if graph is a Multi(Di)Graph. This allows multiple edges instead of dropping them.

symmetrizebool

Default is True to perform symmetrization. If False only duplicate edges are dropped.

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> sym_df = cugraph.symmetrize(M, '0', '1')
cugraph.structure.symmetrize.symmetrize_df(df, src_name, dst_name, multi=False, symmetrize=True)[source]

Take a COO stored in a DataFrame, along with the column names of the source and destination columns and create a new data frame using the same column names that symmetrize the graph so that all edges appear in both directions. Note that if other columns exist in the data frame (e.g. edge weights) the other columns will also be replicated. That is, if (u,v,data) represents the source value (u), destination value (v) and some set of other columns (data) in the input data, then the output data will contain both (u,v,data) and (v,u,data) with matching data. If (u,v,data1) and (v,u,data2) exist in the input data where data1 != data2 then this code will arbitrarily pick the smaller data element to keep, if this is not desired then the caller should should correct the data prior to calling symmetrize.

Parameters
dfcudf.DataFrame

Input data frame containing COO. Columns should contain source ids, destination ids and any properties associated with the edges.

src_namestring

Name of the column in the data frame containing the source ids

dst_namestring

Name of the column in the data frame containing the destination ids

multibool

Set to True if graph is a Multi(Di)Graph. This allows multiple edges instead of dropping them.

symmetrizebool

Default is True to perform symmetrization. If False only duplicate edges are dropped.

Examples

>>> import cugraph.dask as dcg
>>> Comms.initialize()
>>> chunksize = dcg.get_chunksize(input_data_path)
>>> ddf = dask_cudf.read_csv(input_data_path, chunksize=chunksize,
                             delimiter=' ',
                             names=['src', 'dst', 'weight'],
                             dtype=['int32', 'int32', 'float32'])
>>> sym_ddf = cugraph.symmetrize_ddf(ddf, "src", "dst", "weight")
>>> Comms.destroy()

Conversion from Other Formats

cugraph.structure.convert_matrix.from_adjlist(offsets, indices, values=None, create_using=<class 'cugraph.structure.graph_classes.Graph'>)[source]

Initializes the graph from cuDF or Pandas Series representing adjacency matrix CSR data and returns a new cugraph.Graph object if ‘create_using’ is set to cugraph.Graph (the default), or cugraph.DiGraph if ‘create_using’ is set to cugraph.DiGraph.

Parameters
offsetscudf.Series, pandas.Series

The offsets of a CSR adjacency matrix.

indicescudf.Series, pandas.Series

The indices of a CSR adjacency matrix.

valuescudf.Series, pandas.Series, or None (default), optional

The values in a CSR adjacency matrix, which represent edge weights in a graph. If not provided, the resulting graph is considered unweighted.

create_usingcuGraph.Graph

Specify the type of Graph to create. Default is cugraph.Graph

Examples

>>> pdf = pd.read_csv('datasets/karate.csv', delimiter=' ',
...                   dtype={0:'int32', 1:'int32', 2:'float32'},
...                   header=None)
>>> M = scipy.sparse.coo_matrix((pdf[2],(pdf[0],pdf[1])))
>>> M = M.tocsr()
>>> offsets = pd.Series(M.indptr)
>>> indices = pd.Series(M.indices)
>>> G = cugraph.from_adjlist(offsets, indices, None)
cugraph.structure.convert_matrix.from_cudf_edgelist(df, source='source', destination='destination', edge_attr=None, create_using=<class 'cugraph.structure.graph_classes.Graph'>, renumber=True)[source]

Return a new graph created from the edge list representaion. This function is added for NetworkX compatibility (this function is a RAPIDS version of NetworkX’s from_pandas_edge_list()). This function does not support multiple source or destination columns. But does support renumbering

Parameters
dfcudf.DataFrame

This cudf.DataFrame contains columns storing edge source vertices, destination (or target following NetworkX’s terminology) vertices, and (optional) weights.

sourcestring or integer

This is used to index the source column.

destinationstring or integer

This is used to index the destination (or target following NetworkX’s terminology) column.

edge_attrstring or integer, optional

This pointer can be None. If not, this is used to index the weight column.

create_usingcuGraph.Graph

Specify the type of Graph to create. Default is cugraph.Graph

renumberbool

If source and destination indices are not in range 0 to V where V is number of vertices, renumber argument should be True.

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G = cugraph.from_cudf_edgelist(M, source='0', target='1', weight='2')
cugraph.structure.convert_matrix.from_edgelist(df, source='source', destination='destination', edge_attr=None, create_using=<class 'cugraph.structure.graph_classes.Graph'>, renumber=True)[source]

Return a new graph created from the edge list representaion.

Parameters
dfcudf.DataFrame, pandas.DataFrame, dask_cudf.core.DataFrame

This DataFrame contains columns storing edge source vertices, destination (or target following NetworkX’s terminology) vertices, and (optional) weights.

sourcestring or integer

This is used to index the source column.

destinationstring or integer

This is used to index the destination (or target following NetworkX’s terminology) column.

edge_attrstring or integer, optional

This pointer can be None. If not, this is used to index the weight column.

create_usingcuGraph.Graph

Specify the type of Graph to create. Default is cugraph.Graph

renumberbool

If source and destination indices are not in range 0 to V where V is number of vertices, renumber argument should be True.

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G = cugraph.from_edgelist(M, source='0', destination='1',
                              edge_attr='2')
cugraph.structure.convert_matrix.from_numpy_array(A, create_using=<class 'cugraph.structure.graph_classes.Graph'>)[source]

Initializes the graph from numpy array containing adjacency matrix. Set create_using to cugraph.DiGraph for directed graph and cugraph.Graph for undirected Graph.

cugraph.structure.convert_matrix.from_numpy_matrix(A, create_using=<class 'cugraph.structure.graph_classes.Graph'>)[source]

Initializes the graph from numpy matrix containing adjacency matrix. Set create_using to cugraph.DiGraph for directed graph and cugraph.Graph for undirected Graph.

cugraph.structure.convert_matrix.from_pandas_adjacency(df, create_using=<class 'cugraph.structure.graph_classes.Graph'>)[source]

Initializes the graph from pandas adjacency matrix. Set create_using to cugraph.DiGraph for directed graph and cugraph.Graph for undirected Graph.

cugraph.structure.convert_matrix.from_pandas_edgelist(df, source='source', destination='destination', edge_attr=None, create_using=<class 'cugraph.structure.graph_classes.Graph'>, renumber=True)[source]

Initialize a graph from the edge list. It is an error to call this method on an initialized Graph object. Source argument is source column name and destination argument is destination column name.

By default, renumbering is enabled to map the source and destination vertices into an index in the range [0, V) where V is the number of vertices. If the input vertices are a single column of integers in the range [0, V), renumbering can be disabled and the original external vertex ids will be used.

If weights are present, edge_attr argument is the weights column name.

Parameters
input_dfpandas.DataFrame

A DataFrame that contains edge information

sourcestr or array-like

source column name or array of column names

destinationstr or array-like

destination column name or array of column names

edge_attrstr or None

the weights column name. Default is None

renumberbool

Indicate whether or not to renumber the source and destination vertex IDs. Default is True.

create_using: cugraph.DiGraph or cugraph.Graph

Indicate whether to create a directed or undirected graph

Returns
Gcugraph.DiGraph or cugraph.Graph

graph containing edges from the pandas edgelist

Examples

>>> df = pandas.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_pandas_edgelist(df, source='0', destination='1',
                           edge_attr='2', renumber=False)
cugraph.structure.convert_matrix.to_numpy_array(G)[source]

Returns the graph adjacency matrix as a NumPy array.

cugraph.structure.convert_matrix.to_numpy_matrix(G)[source]

Returns the graph adjacency matrix as a NumPy matrix.

cugraph.structure.convert_matrix.to_pandas_adjacency(G)[source]

Returns the graph adjacency matrix as a Pandas DataFrame. The row indices denote source and column names denote destination.

cugraph.structure.convert_matrix.to_pandas_edgelist(G, source='source', destination='destination')[source]

Returns the graph edge list as a Pandas DataFrame.

Parameters
Gcugraph.Graph or cugraph.DiGraph

Graph containg the edgelist.

sourcestr or array-like

source column name or array of column names

destinationstr or array-like

destination column name or array of column names

Returns
——
dfpandas.DataFrame

pandas dataframe containing the edgelist as source and destination columns.

Centrality

Betweenness Centrality

cugraph.centrality.betweenness_centrality.betweenness_centrality(G, k=None, normalized=True, weight=None, endpoints=False, seed=None, result_dtype=<class 'numpy.float64'>)[source]

Compute the betweenness centrality for all vertices of the graph G. Betweenness centrality is a measure of the number of shortest paths that pass through a vertex. A vertex with a high betweenness centrality score has more paths passing through it and is therefore believed to be more important.

To improve performance. rather than doing an all-pair shortest path, a sample of k starting vertices can be used.

CuGraph does not currently support the ‘endpoints’ and ‘weight’ parameters as seen in the corresponding networkX call.

Parameters
GcuGraph.Graph or networkx.Graph

The graph can be either directed (DiGraph) or undirected (Graph). Weights in the graph are ignored, the current implementation uses BFS traversals. Use weight parameter if weights need to be considered (currently not supported)

kint or list or None, optional, default=None

If k is not None, use k node samples to estimate betweenness. Higher values give better approximation. If k is a list, use the content of the list for estimation: the list should contain vertex identifiers. If k is None (the default), all the vertices are used to estimate betweenness. Vertices obtained through sampling or defined as a list will be used assources for traversals inside the algorithm.

normalizedbool, optional

Default is True. If true, the betweenness values are normalized by __2 / ((n - 1) * (n - 2))__ for Graphs (undirected), and __1 / ((n - 1) * (n - 2))__ for DiGraphs (directed graphs) where n is the number of nodes in G. Normalization will ensure that values are in [0, 1], this normalization scales for the highest possible value where one node is crossed by every single shortest path.

weightcudf.DataFrame, optional, default=None

Specifies the weights to be used for each edge. Should contain a mapping between edges and weights. (Not Supported)

endpointsbool, optional, default=False

If true, include the endpoints in the shortest path counts. (Not Supported)

seedoptional

if k is specified and k is an integer, use seed to initialize the random number generator. Using None as seed relies on random.seed() behavior: using current system time If k is either None or list: seed parameter is ignored

result_dtypenp.float32 or np.float64, optional, default=np.float64

Indicate the data type of the betweenness centrality scores

Returns
dfcudf.DataFrame or Dictionary if using NetworkX

GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding betweenness centrality values. Please note that the resulting the ‘vertex’ column might not be in ascending order. The Dictionary conatains the same two columns

df[‘vertex’]cudf.Series

Contains the vertex identifiers

df[‘betweenness_centrality’]cudf.Series

Contains the betweenness centrality of vertices

Examples

>>> gdf = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(gdf, source='0', destination='1')
>>> bc = cugraph.betweenness_centrality(G)
cugraph.centrality.betweenness_centrality.edge_betweenness_centrality(G, k=None, normalized=True, weight=None, seed=None, result_dtype=<class 'numpy.float64'>)[source]

Compute the edge betweenness centrality for all edges of the graph G. Betweenness centrality is a measure of the number of shortest paths that pass over an edge. An edge with a high betweenness centrality score has more paths passing over it and is therefore believed to be more important.

To improve performance, rather than doing an all-pair shortest path, a sample of k starting vertices can be used.

CuGraph does not currently support the ‘weight’ parameter as seen in the corresponding networkX call.

Parameters
GcuGraph.Graph or networkx.Graph

The graph can be either directed (DiGraph) or undirected (Graph). Weights in the graph are ignored, the current implementation uses BFS traversals. Use weight parameter if weights need to be considered (currently not supported)

kint or list or None, optional, default=None

If k is not None, use k node samples to estimate betweenness. Higher values give better approximation. If k is a list, use the content of the list for estimation: the list should contain vertices identifiers. Vertices obtained through sampling or defined as a list will be used as sources for traversals inside the algorithm.

normalizedbool, optional

Default is True. If true, the betweenness values are normalized by 2 / (n * (n - 1)) for Graphs (undirected), and 1 / (n * (n - 1)) for DiGraphs (directed graphs) where n is the number of nodes in G. Normalization will ensure that values are in [0, 1], this normalization scales for the highest possible value where one edge is crossed by every single shortest path.

weightcudf.DataFrame, optional, default=None

Specifies the weights to be used for each edge. Should contain a mapping between edges and weights. (Not Supported)

seedoptional

if k is specified and k is an integer, use seed to initialize the random number generator. Using None as seed relies on random.seed() behavior: using current system time If k is either None or list: seed parameter is ignored

result_dtypenp.float32 or np.float64, optional, default=np.float64

Indicate the data type of the betweenness centrality scores Using double automatically switch implementation to “default”

Returns
dfcudf.DataFrame or Dictionary if using NetworkX

GPU data frame containing three cudf.Series of size E: the vertex identifiers of the sources, the vertex identifies of the destinations and the corresponding betweenness centrality values. Please note that the resulting the ‘src’, ‘dst’ column might not be in ascending order.

df[‘src’]cudf.Series

Contains the vertex identifiers of the source of each edge

df[‘dst’]cudf.Series

Contains the vertex identifiers of the destination of each edge

df[‘edge_betweenness_centrality’]cudf.Series

Contains the betweenness centrality of edges

When using undirected graphs, ‘src’ and ‘dst’ only contains elements such that ‘src’ < ‘dst’, which might differ from networkx and user’s input. Namely edge (1 -> 0) is transformed into (0 -> 1) but contains the betweenness centrality of edge (1 -> 0).

Examples

>>> gdf = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(gdf, source='0', destination='1')
>>> ebc = cugraph.edge_betweenness_centrality(G)

Katz Centrality

cugraph.centrality.katz_centrality.katz_centrality(G, alpha=None, beta=None, max_iter=100, tol=1e-06, nstart=None, normalized=True)[source]

Compute the Katz centrality for the nodes of the graph G. cuGraph does not currently support the ‘beta’ and ‘weight’ parameters as seen in the corresponding networkX call. This implementation is based on a relaxed version of Katz defined by Foster with a reduced computational complexity of O(n+m)

Foster, K.C., Muth, S.Q., Potterat, J.J. et al. Computational & Mathematical Organization Theory (2001) 7: 275. https://doi.org/10.1023/A:1013470632383

Parameters
GcuGraph.Graph or networkx.Graph

cuGraph graph descriptor with connectivity information. The graph can contain either directed (DiGraph) or undirected edges (Graph).

alphafloat

Attenuation factor defaulted to None. If alpha is not specified then it is internally calculated as 1/(degree_max) where degree_max is the maximum out degree.

NOTE

The maximum acceptable value of alpha for convergence alpha_max = 1/(lambda_max) where lambda_max is the largest eigenvalue of the graph. Since lambda_max is always lesser than or equal to degree_max for a graph, alpha_max will always be greater than or equal to (1/degree_max). Therefore, setting alpha to (1/degree_max) will guarantee that it will never exceed alpha_max thus in turn fulfilling the requirement for convergence.

betaNone

A weight scalar - currently Not Supported

max_iterint

The maximum number of iterations before an answer is returned. This can be used to limit the execution time and do an early exit before the solver reaches the convergence tolerance. If this value is lower or equal to 0 cuGraph will use the default value, which is 100.

tolerancefloat

Set the tolerance the approximation, this parameter should be a small magnitude value. The lower the tolerance the better the approximation. If this value is 0.0f, cuGraph will use the default value which is 1.0e-6. Setting too small a tolerance can lead to non-convergence due to numerical roundoff. Usually values between 1e-2 and 1e-6 are acceptable.

nstartcudf.Dataframe

GPU Dataframe containing the initial guess for katz centrality.

nstart[‘vertex’]cudf.Series

Contains the vertex identifiers

nstart[‘values’]cudf.Series

Contains the katz centrality values of vertices

normalizedbool

If True normalize the resulting katz centrality values

Returns
dfcudf.DataFrame or Dictionary if using NetworkX

GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding katz centrality values.

df[‘vertex’]cudf.Series

Contains the vertex identifiers

df[‘katz_centrality’]cudf.Series

Contains the katz centrality of vertices

Examples

>>> gdf = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(gdf, source='0', destination='1')
>>> kc = cugraph.katz_centrality(G)

Katz Centrality (MG)

cugraph.dask.centrality.katz_centrality.call_katz_centrality(sID, data, num_verts, num_edges, vertex_partition_offsets, aggregate_segment_offsets, alpha, beta, max_iter, tol, nstart, normalized)[source]
cugraph.dask.centrality.katz_centrality.katz_centrality(input_graph, alpha=None, beta=None, max_iter=100, tol=1e-05, nstart=None, normalized=True)[source]

Compute the Katz centrality for the nodes of the graph G.

Parameters
input_graphcuGraph.Graph

cuGraph graph descriptor with connectivity information. The graph can contain either directed (DiGraph) or undirected edges (Graph).

alphafloat

Attenuation factor defaulted to None. If alpha is not specified then it is internally calculated as 1/(degree_max) where degree_max is the maximum out degree.

NOTE

The maximum acceptable value of alpha for convergence alpha_max = 1/(lambda_max) where lambda_max is the largest eigenvalue of the graph. Since lambda_max is always lesser than or equal to degree_max for a graph, alpha_max will always be greater than or equal to (1/degree_max). Therefore, setting alpha to (1/degree_max) will guarantee that it will never exceed alpha_max thus in turn fulfilling the requirement for convergence.

betaNone

A weight scalar - currently Not Supported

max_iterint

The maximum number of iterations before an answer is returned. This can be used to limit the execution time and do an early exit before the solver reaches the convergence tolerance. If this value is lower or equal to 0 cuGraph will use the default value, which is 100.

tolerancefloat

Set the tolerance the approximation, this parameter should be a small magnitude value. The lower the tolerance the better the approximation. If this value is 0.0f, cuGraph will use the default value which is 1.0e-6. Setting too small a tolerance can lead to non-convergence due to numerical roundoff. Usually values between 1e-2 and 1e-6 are acceptable.

nstartdask_cudf.Dataframe

GPU Dataframe containing the initial guess for katz centrality

nstart[‘vertex’]dask_cudf.Series

Contains the vertex identifiers

nstart[‘values’]dask_cudf.Series

Contains the katz centrality values of vertices

normalizedbool

If True normalize the resulting katz centrality values

Returns
katz_centralitydask_cudf.DataFrame

GPU data frame containing two dask_cudf.Series of size V: the vertex identifiers and the corresponding katz centrality values.

ddf[‘vertex’]dask_cudf.Series

Contains the vertex identifiers

ddf[‘katz_centrality’]dask_cudf.Series

Contains the katz centrality of vertices

Examples

>>> import cugraph.dask as dcg
>>> ... Init a DASK Cluster
>>    see https://docs.rapids.ai/api/cugraph/stable/dask-cugraph.html
>>> chunksize = dcg.get_chunksize(input_data_path)
>>> ddf = dask_cudf.read_csv(input_data_path, chunksize=chunksize,
                             delimiter=' ',
                             names=['src', 'dst', 'value'],
                             dtype=['int32', 'int32', 'float32'])
>>> dg = cugraph.DiGraph()
>>> dg.from_dask_cudf_edgelist(ddf, source='src', destination='dst',
                               edge_attr='value')
>>> pr = dcg.katz_centrality(dg)

Community

EgoNet

cugraph.community.egonet.batched_ego_graphs(G, seeds, radius=1, center=True, undirected=False, distance=None)[source]

Compute the induced subgraph of neighbors for each node in seeds within a given radius.

Parameters
Gcugraph.Graph, networkx.Graph, CuPy or SciPy sparse matrix

Graph or matrix object, which should contain the connectivity information. Edge weights, if present, should be single or double precision floating point values.

seedscudf.Series or list or cudf.DataFrame

Specifies the seeds of the induced egonet subgraphs.

radius: integer, optional

Include all neighbors of distance<=radius from n.

center: bool, optional

Defaults to True. False is not supported

undirected: bool, optional

Defaults to False. True is not supported

distance: key, optional

Distances are counted in hops from n. Other cases are not supported.

Returns
ego_edge_listscudf.DataFrame or pandas.DataFrame

GPU data frame containing all induced sources identifiers, destination identifiers, edge weights

seeds_offsets: cudf.Series

Series containing the starting offset in the returned edge list for each seed.

cugraph.community.egonet.ego_graph(G, n, radius=1, center=True, undirected=False, distance=None)[source]

Compute the induced subgraph of neighbors centered at node n, within a given radius.

Parameters
Gcugraph.Graph, networkx.Graph, CuPy or SciPy sparse matrix

Graph or matrix object, which should contain the connectivity information. Edge weights, if present, should be single or double precision floating point values.

ninteger or cudf.DataFrame

A single node as integer or a cudf.DataFrame if nodes are represented with multiple columns. If a cudf.DataFrame is provided, only the first row is taken as the node input.

radius: integer, optional

Include all neighbors of distance<=radius from n.

center: bool, optional

Defaults to True. False is not supported

undirected: bool, optional

Defaults to False. True is not supported

distance: key, optional

Distances are counted in hops from n. Other cases are not supported.

Returns
G_egocuGraph.Graph or networkx.Graph

A graph descriptor with a minimum spanning tree or forest. The networkx graph will not have all attributes copied over

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1')
>>> ego_graph = cugraph.ego_graph(G, seed, radius=2)

Ensemble clustering for graphs (ECG)

cugraph.community.ecg.ecg(input_graph, min_weight=0.05, ensemble_size=16, weight=None)[source]

Compute the Ensemble Clustering for Graphs (ECG) partition of the input graph. ECG runs truncated Louvain on an ensemble of permutations of the input graph, then uses the ensemble partitions to determine weights for the input graph. The final result is found by running full Louvain on the input graph using the determined weights.

See https://arxiv.org/abs/1809.05578 for further information.

Parameters
input_graphcugraph.Graph or NetworkX Graph

The graph descriptor should contain the connectivity information and weights. The adjacency list will be computed if not already present.

min_weightfloating point

The minimum value to assign as an edgeweight in the ECG algorithm. It should be a value in the range [0,1] usually left as the default value of .05

ensemble_sizeinteger

The number of graph permutations to use for the ensemble. The default value is 16, larger values may produce higher quality partitions for some graphs.

weightstr

This parameter is here for NetworkX compatibility and represents which NetworkX data column represents Edge weights. Default is None

Returns
partscudf.DataFrame or python dictionary

GPU data frame of size V containing two columns, the vertex id and the partition id it is assigned to.

df[vertex]cudf.Series

Contains the vertex identifiers

df[partition]cudf.Series

Contains the partition assigned to the vertices

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr='2')
>>> parts = cugraph.ecg(G)

K-Truss

cugraph.community.ktruss_subgraph.k_truss(G, k)[source]

Returns the K-Truss subgraph of a graph for a specific k.

The k-truss of a graph is a subgraph where each edge is part of at least (k−2) triangles. K-trusses are used for finding tighlty knit groups of vertices in a graph. A k-truss is a relaxation of a k-clique in the graph and was define in [1]. Finding cliques is computationally demanding and finding the maximal k-clique is known to be NP-Hard.

Parameters
GcuGraph.Graph or networkx.Graph

cuGraph graph descriptor with connectivity information. k-Trusses are defined for only undirected graphs as they are defined for undirected triangle in a graph.

kint

The desired k to be used for extracting the k-truss subgraph.

Returns
G_trusscuGraph.Graph or networkx.Graph

A cugraph graph descriptor with the k-truss subgraph for the given k. The networkx graph will NOT have all attributes copied over

cugraph.community.ktruss_subgraph.ktruss_subgraph(G, k, use_weights=True)[source]

Returns the K-Truss subgraph of a graph for a specific k.

The k-truss of a graph is a subgraph where each edge is part of at least (k−2) triangles. K-trusses are used for finding tighlty knit groups of vertices in a graph. A k-truss is a relaxation of a k-clique in the graph and was define in [1]. Finding cliques is computationally demanding and finding the maximal k-clique is known to be NP-Hard.

In contrast, finding a k-truss is computationally tractable as its key building block, namely triangle counting counting, can be executed in polnymomial time.Typically, it takes many iterations of triangle counting to find the k-truss of a graph. Yet these iterations operate on a weakly monotonically shrinking graph. Therefore, finding the k-truss of a graph can be done in a fairly reasonable amount of time. The solution in cuGraph is based on a GPU algorithm first shown in [2] and uses the triangle counting algorithm from [3].

[1] Cohen, J., “Trusses: Cohesive subgraphs for social network analysis” National security agency technical report, 2008

[2] O. Green, J. Fox, E. Kim, F. Busato, et al. “Quickly Finding a Truss in a Haystack” IEEE High Performance Extreme Computing Conference (HPEC), 2017 https://doi.org/10.1109/HPEC.2017.8091038

[3] O. Green, P. Yalamanchili, L.M. Munguia, “Fast Triangle Counting on GPU” Irregular Applications: Architectures and Algorithms (IA3), 2014

Parameters
GcuGraph.Graph

cuGraph graph descriptor with connectivity information. k-Trusses are defined for only undirected graphs as they are defined for undirected triangle in a graph.

kint

The desired k to be used for extracting the k-truss subgraph.

use_weightsBool

whether the output should contain the edge weights if G has them

Returns
G_trusscuGraph.Graph

A cugraph graph descriptor with the k-truss subgraph for the given k.

Examples

>>> gdf = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(gdf, source='0', destination='1')
>>> k_subgraph = cugraph.ktruss_subgraph(G, 3)

Leiden

cugraph.community.leiden.leiden(G, max_iter=100, resolution=1.0)[source]

Compute the modularity optimizing partition of the input graph using the Leiden algorithm

It uses the Louvain method described in:

Traag, V. A., Waltman, L., & van Eck, N. J. (2019). From Louvain to Leiden: guaranteeing well-connected communities. Scientific reports, 9(1), 5233. doi: 10.1038/s41598-019-41695-z

Parameters
Gcugraph.Graph

cuGraph graph descriptor of type Graph

The adjacency list will be computed if not already present.

max_iterinteger

This controls the maximum number of levels/iterations of the Leiden algorithm. When specified the algorithm will terminate after no more than the specified number of iterations. No error occurs when the algorithm terminates early in this manner.

resolution: float/double, optional

Called gamma in the modularity formula, this changes the size of the communities. Higher resolutions lead to more smaller communities, lower resolutions lead to fewer larger communities. Defaults to 1.

Returns
partscudf.DataFrame

GPU data frame of size V containing two columns the vertex id and the partition id it is assigned to.

df[‘vertex’]cudf.Series

Contains the vertex identifiers

df[‘partition’]cudf.Series

Contains the partition assigned to the vertices

modularity_scorefloat

a floating point number containing the global modularity score of the partitioning.

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1')
>>> parts, modularity_score = cugraph.leiden(G)

Louvain

cugraph.community.louvain.louvain(G, max_iter=100, resolution=1.0)[source]

Compute the modularity optimizing partition of the input graph using the Louvain method

It uses the Louvain method described in:

VD Blondel, J-L Guillaume, R Lambiotte and E Lefebvre: Fast unfolding of community hierarchies in large networks, J Stat Mech P10008 (2008), http://arxiv.org/abs/0803.0476

Parameters
Gcugraph.Graph or NetworkX Graph

The graph descriptor should contain the connectivity information and weights. The adjacency list will be computed if not already present.

max_iterinteger

This controls the maximum number of levels/iterations of the Louvain algorithm. When specified the algorithm will terminate after no more than the specified number of iterations. No error occurs when the algorithm terminates early in this manner.

resolution: float/double, optional

Called gamma in the modularity formula, this changes the size of the communities. Higher resolutions lead to more smaller communities, lower resolutions lead to fewer larger communities. Defaults to 1.

Returns
partscudf.DataFrame

GPU data frame of size V containing two columns the vertex id and the partition id it is assigned to.

df[‘vertex’]cudf.Series

Contains the vertex identifiers

df[‘partition’]cudf.Series

Contains the partition assigned to the vertices

modularity_scorefloat

a floating point number containing the global modularity score of the partitioning.

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1')
>>> parts, modularity_score = cugraph.louvain(G)

Louvain (MG)

cugraph.dask.community.louvain.call_louvain(sID, data, num_verts, num_edges, vertex_partition_offsets, aggregate_segment_offsets, max_level, resolution)[source]
cugraph.dask.community.louvain.louvain(input_graph, max_iter=100, resolution=1.0)[source]

Compute the modularity optimizing partition of the input graph using the Louvain method on multiple GPUs

Examples

>>> import cugraph.dask as dcg
>>> ... Init a DASK Cluster
>>    see https://docs.rapids.ai/api/cugraph/stable/dask-cugraph.html
>>> chunksize = dcg.get_chunksize(input_data_path)
>>> ddf = dask_cudf.read_csv('datasets/karate.csv', chunksize=chunksize,
                             delimiter=' ',
                             names=['src', 'dst', 'value'],
                             dtype=['int32', 'int32', 'float32'])
>>> dg = cugraph.Graph()
>>> dg.from_dask_cudf_edgelist(ddf, source='src', destination='dst',
                               edge_attr='value')
>>> parts, modularity_score = dcg.louvain(dg)

Spectral Clustering

cugraph.community.spectral_clustering.analyzeClustering_edge_cut(G, n_clusters, clustering, vertex_col_name='vertex', cluster_col_name='cluster')[source]

Compute the edge cut score for a partitioning/clustering The assumption is that “clustering” is the results from a call from a special clustering algorithm and contains columns named “vertex” and “cluster”.

Parameters
Gcugraph.Graph

cuGraph graph descriptor

n_clustersinteger

Specifies the number of clusters in the given clustering

clusteringcudf.DataFrame

The cluster assignment to analyze.

vertex_col_namestr

The name of the column in the clustering dataframe identifying the external vertex id

cluster_col_namestr

The name of the column in the clustering dataframe identifying the cluster id

Returns
scorefloat

The computed edge cut score

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr=None)
>>> df = cugraph.spectralBalancedCutClustering(G, 5)
>>> score = cugraph.analyzeClustering_edge_cut(G, 5, df)
cugraph.community.spectral_clustering.analyzeClustering_modularity(G, n_clusters, clustering, vertex_col_name='vertex', cluster_col_name='cluster')[source]

Compute the modularity score for a given partitioning/clustering. The assumption is that “clustering” is the results from a call from a special clustering algorithm and contains columns named “vertex” and “cluster”.

Parameters
Gcugraph.Graph or networkx.Graph

graph descriptor. This graph should have edge weights.

n_clustersinteger

Specifies the number of clusters in the given clustering

clusteringcudf.DataFrame

The cluster assignment to analyze.

vertex_col_namestr or list of str

The names of the column in the clustering dataframe identifying the external vertex id

cluster_col_namestr

The name of the column in the clustering dataframe identifying the cluster id

Returns
scorefloat

The computed modularity score

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr='2')
>>> df = cugraph.spectralBalancedCutClustering(G, 5)
>>> score = cugraph.analyzeClustering_modularity(G, 5, df)
cugraph.community.spectral_clustering.analyzeClustering_ratio_cut(G, n_clusters, clustering, vertex_col_name='vertex', cluster_col_name='cluster')[source]

Compute the ratio cut score for a partitioning/clustering

Parameters
Gcugraph.Graph

cuGraph graph descriptor. This graph should have edge weights.

n_clustersinteger

Specifies the number of clusters in the given clustering

clusteringcudf.DataFrame

The cluster assignment to analyze.

vertex_col_namestr

The name of the column in the clustering dataframe identifying the external vertex id

cluster_col_namestr

The name of the column in the clustering dataframe identifying the cluster id

Returns
scorefloat

The computed ratio cut score

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr='2')
>>> df = cugraph.spectralBalancedCutClustering(G, 5)
>>> score = cugraph.analyzeClustering_ratio_cut(G, 5, df,
>>>   'vertex', 'cluster')
cugraph.community.spectral_clustering.spectralBalancedCutClustering(G, num_clusters, num_eigen_vects=2, evs_tolerance=1e-05, evs_max_iter=100, kmean_tolerance=1e-05, kmean_max_iter=100)[source]

Compute a clustering/partitioning of the given graph using the spectral balanced cut method.

Parameters
Gcugraph.Graph or networkx.Graph

graph descriptor

num_clustersinteger

Specifies the number of clusters to find, must be greater than 1

num_eigen_vectsinteger

Specifies the number of eigenvectors to use. Must be lower or equal to num_clusters. Default is 2

evs_tolerance: float

Specifies the tolerance to use in the eigensolver. Default is 0.00001

evs_max_iter: integer

Specifies the maximum number of iterations for the eigensolver. Default is 100

kmean_tolerance: float

Specifies the tolerance to use in the k-means solver. Default is 0.00001

kmean_max_iter: integer

Specifies the maximum number of iterations for the k-means solver. Default is 100

Returns
dfcudf.DataFrame

GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding cluster assignments.

df[‘vertex’]cudf.Series

contains the vertex identifiers

df[‘cluster’]cudf.Series

contains the cluster assignments

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1')
>>> df = cugraph.spectralBalancedCutClustering(G, 5)
cugraph.community.spectral_clustering.spectralModularityMaximizationClustering(G, num_clusters, num_eigen_vects=2, evs_tolerance=1e-05, evs_max_iter=100, kmean_tolerance=1e-05, kmean_max_iter=100)[source]

Compute a clustering/partitioning of the given graph using the spectral modularity maximization method.

Parameters
Gcugraph.Graph or networkx.Graph

cuGraph graph descriptor. This graph should have edge weights.

num_clustersinteger

Specifies the number of clusters to find

num_eigen_vectsinteger

Specifies the number of eigenvectors to use. Must be lower or equal to num_clusters. Default is 2

evs_tolerance: float

Specifies the tolerance to use in the eigensolver. Default is 0.00001

evs_max_iter: integer

Specifies the maximum number of iterations for the eigensolver. Default is 100

kmean_tolerance: float

Specifies the tolerance to use in the k-means solver. Default is 0.00001

kmean_max_iter: integer

Specifies the maximum number of iterations for the k-means solver. Default is 100

Returns
dfcudf.DataFrame
df[‘vertex’]cudf.Series

contains the vertex identifiers

df[‘cluster’]cudf.Series

contains the cluster assignments

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr='2')
>>> df = cugraph.spectralModularityMaximizationClustering(G, 5)

Subgraph Extraction

cugraph.community.subgraph_extraction.subgraph(G, vertices)[source]

Compute a subgraph of the existing graph including only the specified vertices. This algorithm works for both directed and undirected graphs, it does not actually traverse the edges, simply pulls out any edges that are incident on vertices that are both contained in the vertices list.

Parameters
Gcugraph.Graph

cuGraph graph descriptor

verticescudf.Series or cudf.DataFrame

Specifies the vertices of the induced subgraph. For multi-column vertices, vertices should be provided as a cudf.DataFrame

Returns
Sgcugraph.Graph

A graph object containing the subgraph induced by the given vertex set.

Examples

>>> gdf = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(gdf, source='0', destination='1')
>>> verts = numpy.zeros(3, dtype=numpy.int32)
>>> verts[0] = 0
>>> verts[1] = 1
>>> verts[2] = 2
>>> sverts = cudf.Series(verts)
>>> Sg = cugraph.subgraph(G, sverts)

Triangle Counting

cugraph.community.triangle_count.triangles(G)[source]

Compute the number of triangles (cycles of length three) in the input graph.

Unlike NetworkX, this algorithm simply returns the total number of triangle and not the number per vertex.

Parameters
Gcugraph.graph or networkx.Graph

cuGraph graph descriptor, should contain the connectivity information, (edge weights are not used in this algorithm)

Returns
countint64

A 64 bit integer whose value gives the number of triangles in the graph.

Examples

>>> gdf = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(gdf, source='0', destination='1')
>>> count = cugraph.triangles(G)

Components

Connected Components

cugraph.components.connectivity.connected_components(G, directed=None, connection='weak', return_labels=None)[source]

Generate either the stronlgly or weakly connected components and attach a component label to each vertex.

Parameters
Gcugraph.Graph, networkx.Graph, CuPy or SciPy sparse matrix

Graph or matrix object, which should contain the connectivity information (edge weights are not used for this algorithm). If using a graph object, the graph can be either directed or undirected where an undirected edge is represented by a directed edge in both directions. The adjacency list will be computed if not already present. The number of vertices should fit into a 32b int.

directedbool, optional
NOTE

For non-Graph-type (eg. sparse matrix) values of G only. Raises TypeError if used with a Graph object.

If True (default), then convert the input matrix to a cugraph.DiGraph and only move from point i to point j along paths csgraph[i, j]. If False, then find the shortest path on an undirected graph: the algorithm can progress from point i to j along csgraph[i, j] or csgraph[j, i].

connectionstr, optional

[‘weak’|’strong’]. Return either weakly or strongly connected components.

return_labelsbool, optional
NOTE

For non-Graph-type (eg. sparse matrix) values of G only. Raises TypeError if used with a Graph object.

If True (default), then return the labels for each of the connected components.

Returns
Return value type is based on the input type. If G is a cugraph.Graph,
returns:
cudf.DataFrame

GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding component identifier.

df[‘vertex’]

Contains the vertex identifier

df[‘labels’]

The component identifier

If G is a networkx.Graph, returns:

python dictionary, where keys are vertices and values are the component identifiers.

If G is a CuPy or SciPy matrix, returns:

CuPy ndarray (if CuPy matrix input) or Numpy ndarray (if SciPy matrix input) of shape (<num vertices>, 2), where column 0 contains component identifiers and column 1 contains vertices.

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr=None)
>>> df = cugraph.connected_components(G, connection="weak")
cugraph.components.connectivity.strongly_connected_components(G, directed=None, connection=None, return_labels=None)[source]

Generate the Strongly Connected Components and attach a component label to each vertex.

Parameters
Gcugraph.Graph, networkx.Graph, CuPy or SciPy sparse matrix

Graph or matrix object, which should contain the connectivity information (edge weights are not used for this algorithm). If using a graph object, the graph can be either directed or undirected where an undirected edge is represented by a directed edge in both directions. The adjacency list will be computed if not already present. The number of vertices should fit into a 32b int.

directedbool, optional
NOTE

For non-Graph-type (eg. sparse matrix) values of G only. Raises TypeError if used with a Graph object.

If True (default), then convert the input matrix to a cugraph.DiGraph and only move from point i to point j along paths csgraph[i, j]. If False, then find the shortest path on an undirected graph: the algorithm can progress from point i to j along csgraph[i, j] or csgraph[j, i].

connectionstr, optional

Added for SciPy compatibility, can only be specified for non-Graph-type (eg. sparse matrix) values of G only (raises TypeError if used with a Graph object), and can only be set to “strong” for this API.

return_labelsbool, optional
NOTE

For non-Graph-type (eg. sparse matrix) values of G only. Raises TypeError if used with a Graph object.

If True (default), then return the labels for each of the connected components.

Returns
Return value type is based on the input type. If G is a cugraph.Graph,
returns:
cudf.DataFrame

GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding component identifier.

df[‘vertex’]

Contains the vertex identifier

df[‘labels’]

The component identifier

If G is a networkx.Graph, returns:

python dictionary, where keys are vertices and values are the component identifiers.

If G is a CuPy or SciPy matrix, returns:

CuPy ndarray (if CuPy matrix input) or Numpy ndarray (if SciPy matrix input) of shape (<num vertices>, 2), where column 0 contains component identifiers and column 1 contains vertices.

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr=None)
>>> df = cugraph.strongly_connected_components(G)
cugraph.components.connectivity.weakly_connected_components(G, directed=None, connection=None, return_labels=None)[source]

Generate the Weakly Connected Components and attach a component label to each vertex.

Parameters
Gcugraph.Graph, networkx.Graph, CuPy or SciPy sparse matrix

Graph or matrix object, which should contain the connectivity information (edge weights are not used for this algorithm). If using a graph object, the graph can be either directed or undirected where an undirected edge is represented by a directed edge in both directions. The adjacency list will be computed if not already present. The number of vertices should fit into a 32b int.

directedbool, optional
NOTE

For non-Graph-type (eg. sparse matrix) values of G only. Raises TypeError if used with a Graph object.

If True (default), then convert the input matrix to a cugraph.DiGraph and only move from point i to point j along paths csgraph[i, j]. If False, then find the shortest path on an undirected graph: the algorithm can progress from point i to j along csgraph[i, j] or csgraph[j, i].

connectionstr, optional

Added for SciPy compatibility, can only be specified for non-Graph-type (eg. sparse matrix) values of G only (raises TypeError if used with a Graph object), and can only be set to “weak” for this API.

return_labelsbool, optional
NOTE

For non-Graph-type (eg. sparse matrix) values of G only. Raises TypeError if used with a Graph object.

If True (default), then return the labels for each of the connected components.

Returns
Return value type is based on the input type. If G is a cugraph.Graph,
returns:
cudf.DataFrame

GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding component identifier.

df[‘vertex’]

Contains the vertex identifier

df[‘labels’]

The component identifier

If G is a networkx.Graph, returns:

python dictionary, where keys are vertices and values are the component identifiers.

If G is a CuPy or SciPy matrix, returns:

CuPy ndarray (if CuPy matrix input) or Numpy ndarray (if SciPy matrix input) of shape (<num vertices>, 2), where column 0 contains component identifiers and column 1 contains vertices.

Examples

>>> M = cudf.read_csv('datasets/karate.csv',
                      delimiter = ' ',
                      dtype=['int32', 'int32', 'float32'],
                      header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr=None)
>>> df = cugraph.weakly_connected_components(G)

Connected Components (MG)

cugraph.dask.components.connectivity.call_wcc(sID, data, num_verts, num_edges, vertex_partition_offsets, aggregate_segment_offsets)[source]
cugraph.dask.components.connectivity.weakly_connected_components(input_graph)[source]

Cores

Core Number

cugraph.cores.core_number.core_number(G)[source]

Compute the core numbers for the nodes of the graph G. A k-core of a graph is a maximal subgraph that contains nodes of degree k or more. A node has a core number of k if it belongs a k-core but not to k+1-core. This call does not support a graph with self-loops and parallel edges.

Parameters
graphcuGraph.Graph or networkx.Graph

The graph should contain undirected edges where undirected edges are represented as directed edges in both directions. While this graph can contain edge weights, they don’t participate in the calculation of the core numbers.

Returns
dfcudf.DataFrame or python dictionary (in NetworkX input)

GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding core number values.

df[‘vertex’]cudf.Series

Contains the vertex identifiers

df[‘core_number’]cudf.Series

Contains the core number of vertices

Examples

>>> gdf = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(gdf, source='0', destination='1')
>>> cn = cugraph.core_number(G)

K-Core

cugraph.cores.k_core.k_core(G, k=None, core_number=None)[source]

Compute the k-core of the graph G based on the out degree of its nodes. A k-core of a graph is a maximal subgraph that contains nodes of degree k or more. This call does not support a graph with self-loops and parallel edges.

Parameters
GcuGraph.Graph or networkx.Graph

cuGraph graph descriptor with connectivity information. The graph should contain undirected edges where undirected edges are represented as directed edges in both directions. While this graph can contain edge weights, they don’t participate in the calculation of the k-core.

kint, optional

Order of the core. This value must not be negative. If set to None, the main core is returned.

core_numbercudf.DataFrame, optional

Precomputed core number of the nodes of the graph G containing two cudf.Series of size V: the vertex identifiers and the corresponding core number values. If set to None, the core numbers of the nodes are calculated internally.

core_number[‘vertex’]cudf.Series

Contains the vertex identifiers

core_number[‘values’]cudf.Series

Contains the core number of vertices

Returns
KCoreGraphcuGraph.Graph

K Core of the input graph

Examples

>>> gdf = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(gdf, source='0', destination='1')
>>> KCoreGraph = cugraph.k_core(G)

Layout

Force Atlas 2

cugraph.layout.force_atlas2.force_atlas2(input_graph, max_iter=500, pos_list=None, outbound_attraction_distribution=True, lin_log_mode=False, prevent_overlapping=False, edge_weight_influence=1.0, jitter_tolerance=1.0, barnes_hut_optimize=True, barnes_hut_theta=0.5, scaling_ratio=2.0, strong_gravity_mode=False, gravity=1.0, verbose=False, callback=None)[source]

ForceAtlas2 is a continuous graph layout algorithm for handy network visualization.

NOTE: Peak memory allocation occurs at 30*V.

Parameters
input_graphcugraph.Graph

cuGraph graph descriptor with connectivity information. Edge weights, if present, should be single or double precision floating point values.

max_iterinteger

This controls the maximum number of levels/iterations of the Force Atlas algorithm. When specified the algorithm will terminate after no more than the specified number of iterations. No error occurs when the algorithm terminates in this manner. Good short-term quality can be achieved with 50-100 iterations. Above 1000 iterations is discouraged.

pos_list: cudf.DataFrame

Data frame with initial vertex positions containing two columns: ‘x’ and ‘y’ positions.

outbound_attraction_distribution: bool

Distributes attraction along outbound edges. Hubs attract less and thus are pushed to the borders.

lin_log_mode: bool

Switch Force Atlas model from lin-lin to lin-log. Makes clusters more tight.

prevent_overlapping: bool

Prevent nodes to overlap.

edge_weight_influence: float

How much influence you give to the edges weight. 0 is “no influence” and 1 is “normal”.

jitter_tolerance: float

How much swinging you allow. Above 1 discouraged. Lower gives less speed and more precision.

barnes_hut_optimize: bool

Whether to use the Barnes Hut approximation or the slower exact version.

barnes_hut_theta: float

Float between 0 and 1. Tradeoff for speed (1) vs accuracy (0) for Barnes Hut only.

scaling_ratio: float

How much repulsion you want. More makes a more sparse graph. Switching from regular mode to LinLog mode needs a readjustment of the scaling parameter.

strong_gravity_mode: bool

Sets a force that attracts the nodes that are distant from the center more. It is so strong that it can sometimes dominate other forces.

gravityfloat

Attracts nodes to the center. Prevents islands from drifting away.

verbose: bool

Output convergence info at each interation.

callback: GraphBasedDimRedCallback

An instance of GraphBasedDimRedCallback class to intercept the internal state of positions while they are being trained.

Example of callback usage:
from cugraph.internals import GraphBasedDimRedCallback
class CustomCallback(GraphBasedDimRedCallback):
def on_preprocess_end(self, positions):

print(positions.copy_to_host())

def on_epoch_end(self, positions):

print(positions.copy_to_host())

def on_train_end(self, positions):

print(positions.copy_to_host())

Returns
poscudf.DataFrame

GPU data frame of size V containing three columns: the vertex identifiers and the x and y positions.

Linear Assignment

Hungarian

Execute the Hungarian algorithm against a symmetric, weighted, bipartite graph.

As a bipartite graph, the vertex set of the graph can be partitioned into two disjoint sets such that all edges connect a vertex from one set to a vertex of the other set. The workers variable identifies one of the sets of vertices, the other set is all vertices not in the workers set (V - workers).

The edge weights reflect the cost of assigning a particular job to a worker.

The Hungarian algorithm identifies the lowest cost matching of vertices such that all workers that can be assigned work are assigned exactly on job.

Parameters

Gcugraph.Graph

cuGraph graph descriptor, should contain the connectivity information as an an edge list. Edge weights are required. If an edge list is not provided then it will be computed.

workerscudf.Series or cudf.DataFrame

A series or column that identifies the vertex ids of the vertices in the workers set. In case of multi-column vertices, it should be a cudf.DataFrame. All vertices in G that are not in the workers set are implicitly assigned to the jobs set.

epsilonfloat or double (matching weight type in graph)

Used for determining when value is close enough to zero to consider 0. Defaults (if not specified) to 1e-6 in the C++ code. Unused for integer weight types.

Returns

costmatches costs.dtype

The cost of the overall assignment

dfcudf.DataFrame
df[‘vertex’][i] gives the vertex id of the i’th vertex. Only vertices

in the workers list are defined in this column.

df[‘assignment’][i] gives the vertex id of the “job” assigned to the

corresponding vertex.

FIXME: Update this with a real example…

Examples

>>> M = cudf.read_csv('datasets/bipartite.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1', edge_attr='2')
>>> cost, df = cugraph.hungarian(G, workers)

Sampling

Random Walks

cugraph.sampling.random_walks.random_walks(G, start_vertices, max_depth=None, use_padding=False)[source]

compute random walks for each nodes in ‘start_vertices’

Parameters
GcuGraph.Graph or networkx.Graph

The graph can be either directed (DiGraph) or undirected (Graph). Weights in the graph are ignored. Use weight parameter if weights need to be considered (currently not supported)

start_verticesint or list or cudf.Series or cudf.DataFrame

A single node or a list or a cudf.Series of nodes from which to run the random walks. In case of multi-column vertices it should be a cudf.DataFrame

max_depthint

The maximum depth of the random walks

use_paddingbool

If True, padded paths are returned else coalesced paths are returned.

Returns
vertex_pathscudf.Series or cudf.DataFrame

Series containing the vertices of edges/paths in the random walk.

edge_weight_paths: cudf.Series

Series containing the edge weights of edges represented by the returned vertex_paths

sizes: int

The path size in case of coalesced paths.

cugraph.sampling.random_walks.rw_path(num_paths, sizes)[source]

Retrieve more information on the obtained paths in case use_padding is False.

Parameters
num_paths: int

Number of paths in the random walk output.

sizes: int

Path size returned in random walk output.

Returns
path_datacudf.DataFrame

Dataframe containing vetex path offsets, edge weight offsets and edge weight sizes for each path.

Traversal

Breadth-first-search

cugraph.traversal.bfs.bfs(G, start=None, depth_limit=None, i_start=None, directed=None, return_predecessors=None)[source]

Find the distances and predecessors for a breadth first traversal of a graph.

Parameters
Gcugraph.Graph, networkx.Graph, CuPy or SciPy sparse matrix

Graph or matrix object, which should contain the connectivity information. Edge weights, if present, should be single or double precision floating point values.

startInteger

The index of the graph vertex from which the traversal begins

i_startInteger, optional

Identical to start, added for API compatibility. Only start or i_start can be set, not both.

depth_limitInteger or None

Limit the depth of the search

directedbool, optional
NOTE

For non-Graph-type (eg. sparse matrix) values of G only. Raises TypeError if used with a Graph object.

If True (default), then convert the input matrix to a cugraph.DiGraph, otherwise a cugraph.Graph object will be used.

Returns
Return value type is based on the input type. If G is a cugraph.Graph,
returns:
cudf.DataFrame

df[‘vertex’] vertex IDs

df[‘distance’] path distance for each vertex from the starting vertex

df[‘predecessor’] for each i’th position in the column, the vertex ID immediately preceding the vertex at position i in the ‘vertex’ column

If G is a networkx.Graph, returns:

pandas.DataFrame with contents equivalent to the cudf.DataFrame described above.

If G is a CuPy or SciPy matrix, returns:

a 2-tuple of CuPy ndarrays (if CuPy matrix input) or Numpy ndarrays (if SciPy matrix input) representing:

distance: cupy or numpy ndarray

ndarray of shortest distances between source and vertex.

predecessor: cupy or numpy ndarray

ndarray of predecessors of a vertex on the path from source, which can be used to reconstruct the shortest paths.

…or if return_sp_counter is True, returns a 3-tuple with the above two arrays plus:

sp_counter: cupy or numpy ndarray

ndarray of number of shortest paths leading to each vertex.

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1')
>>> df = cugraph.bfs(G, 0)
cugraph.traversal.bfs.bfs_edges(G, source, reverse=False, depth_limit=None, sort_neighbors=None)[source]

Find the distances and predecessors for a breadth first traversal of a graph.

Parameters
Gcugraph.Graph, networkx.Graph, CuPy or SciPy sparse matrix

Graph or matrix object, which should contain the connectivity information. Edge weights, if present, should be single or double precision floating point values.

sourceInteger

The starting vertex index

reverseboolean

If a directed graph, then process edges in a reverse direction Currently not implemented

depth_limitInt or None

Limit the depth of the search

sort_neighborsNone or Function

Currently not implemented

Returns
Return value type is based on the input type. If G is a cugraph.Graph,
returns:
cudf.DataFrame

df[‘vertex’] vertex IDs

df[‘distance’] path distance for each vertex from the starting vertex

df[‘predecessor’] for each i’th position in the column, the vertex ID immediately preceding the vertex at position i in the ‘vertex’ column

If G is a networkx.Graph, returns:

pandas.DataFrame with contents equivalent to the cudf.DataFrame described above.

If G is a CuPy or SciPy matrix, returns:

a 2-tuple of CuPy ndarrays (if CuPy matrix input) or Numpy ndarrays (if SciPy matrix input) representing:

distance: cupy or numpy ndarray

ndarray of shortest distances between source and vertex.

predecessor: cupy or numpy ndarray

ndarray of predecessors of a vertex on the path from source, which can be used to reconstruct the shortest paths.

…or if return_sp_counter is True, returns a 3-tuple with the above two arrays plus:

sp_counter: cupy or numpy ndarray

ndarray of number of shortest paths leading to each vertex.

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1')
>>> df = cugraph.bfs_edges(G, 0)

Breadth-first-search (MG)

cugraph.dask.traversal.bfs.bfs(graph, start, depth_limit=None, return_distances=True)[source]

Find the distances and predecessors for a breadth first traversal of a graph. The input graph must contain edge list as dask-cudf dataframe with one partition per GPU.

Parameters
graphcugraph.DiGraph

cuGraph graph descriptor, should contain the connectivity information as dask cudf edge list dataframe(edge weights are not used for this algorithm). Undirected Graph not currently supported.

startInteger

Specify starting vertex for breadth-first search; this function iterates over edges in the component reachable from this node.

depth_limitInteger or None

Limit the depth of the search

return_distancesbool, optional, default=True

Indicates if distances should be returned

Returns
dfdask_cudf.DataFrame

df[‘vertex’] gives the vertex id

df[‘distance’] gives the path distance from the starting vertex (Only if return_distances is True)

df[‘predecessor’] gives the vertex it was reached from in the traversal

Examples

>>> import cugraph.dask as dcg
>>> ... Init a DASK Cluster
>>    see https://docs.rapids.ai/api/cugraph/stable/dask-cugraph.html
>>> chunksize = dcg.get_chunksize(input_data_path)
>>> ddf = dask_cudf.read_csv(input_data_path, chunksize=chunksize,
                             delimiter=' ',
                             names=['src', 'dst', 'value'],
                             dtype=['int32', 'int32', 'float32'])
>>> dg = cugraph.DiGraph()
>>> dg.from_dask_cudf_edgelist(ddf, 'src', 'dst')
>>> df = dcg.bfs(dg, 0)
cugraph.dask.traversal.bfs.call_bfs(sID, data, num_verts, num_edges, vertex_partition_offsets, aggregate_segment_offsets, start, depth_limit, return_distances)[source]

Single-source-shortest-path

cugraph.traversal.sssp.filter_unreachable(df)[source]

Remove unreachable vertices from the result of SSSP or BFS

Parameters
dfcudf.DataFrame

cudf.DataFrame that is the output of SSSP or BFS

Returns
dffiltered cudf.DataFrame with only reachable vertices

df[‘vertex’][i] gives the vertex id of the i’th vertex. df[‘distance’][i] gives the path distance for the i’th vertex from the starting vertex. df[‘predecessor’][i] gives the vertex that was reached before the i’th vertex in the traversal.

cugraph.traversal.sssp.shortest_path(G, source=None, method=None, directed=None, return_predecessors=None, unweighted=None, overwrite=None, indices=None)[source]

Alias for sssp(), provided for API compatibility with NetworkX. See sssp() for details.

cugraph.traversal.sssp.shortest_path_length(G, source, target=None)[source]

Compute the distance from a source vertex to one or all vertexes in graph. Uses Single Source Shortest Path (SSSP).

Parameters
graphcuGraph.Graph, NetworkX.Graph, or CuPy sparse COO matrix

cuGraph graph descriptor with connectivity information. Edge weights, if present, should be single or double precision floating point values.

sourceDependant on graph type. Index of the source vertex.
If graph is an instance of cuGraph.Graph or CuPy sparse COO matrix:

int

If graph is an instance of a NetworkX.Graph:

str

target: Dependant on graph type. Vertex to find distance to.
If graph is an instance of cuGraph.Graph or CuPy sparse COO matrix:

int

If graph is an instance of a NetworkX.Graph:

str

Returns
Return value type is based on the input type.
If target is None, returns:
cudf.DataFrame
df[‘vertex’]

vertex id

df[‘distance’]

gives the path distance from the starting vertex

If target is not None, returns:

Distance from source to target vertex.

cugraph.traversal.sssp.sssp(G, source=None, method=None, directed=None, return_predecessors=None, unweighted=None, overwrite=None, indices=None)[source]

Compute the distance and predecessors for shortest paths from the specified source to all the vertices in the graph. The distances column will store the distance from the source to each vertex. The predecessors column will store each vertex’s predecessor in the shortest path. Vertices that are unreachable will have a distance of infinity denoted by the maximum value of the data type and the predecessor set as -1. The source vertex’s predecessor is also set to -1. Graphs with negative weight cycles are not supported.

Parameters
graphcugraph.Graph, networkx.Graph, CuPy or SciPy sparse matrix Graph or

matrix object, which should contain the connectivity information. Edge weights, if present, should be single or double precision floating point values.

sourceint

Index of the source vertex.

Returns
Return value type is based on the input type. If G is a cugraph.Graph,
returns:
cudf.DataFrame
df[‘vertex’]

vertex id

df[‘distance’]

gives the path distance from the starting vertex

df[‘predecessor’]

the vertex it was reached from

If G is a networkx.Graph, returns:

pandas.DataFrame with contents equivalent to the cudf.DataFrame described above.

If G is a CuPy or SciPy matrix, returns:

a 2-tuple of CuPy ndarrays (if CuPy matrix input) or Numpy ndarrays (if SciPy matrix input) representing:

distance: cupy or numpy ndarray

ndarray of shortest distances between source and vertex.

predecessor: cupy or numpy ndarray

ndarray of predecessors of a vertex on the path from source, which can be used to reconstruct the shortest paths.

Examples

>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ',
>>>                   dtype=['int32', 'int32', 'float32'], header=None)
>>> G = cugraph.Graph()
>>> G.from_cudf_edgelist(M, source='0', destination='1')
>>> distances = cugraph.sssp(G, 0)

Single-source-shortest-path (MG)

cugraph.dask.traversal.sssp.call_sssp(sID, data, num_verts, num_edges, vertex_partition_offsets, aggregate_segment_offsets, start)[source]
cugraph.dask.traversal.sssp.sssp(graph, source)[source]

Compute the distance and predecessors for shortest paths from the specified source to all the vertices in the graph. The distances column will store the distance from the source to each vertex. The predecessors column will store each vertex’s predecessor in the shortest path. Vertices that are unreachable will have a distance of infinity denoted by the maximum value of the data type and the predecessor set as -1. The source vertex’s predecessor is also set to -1. The input graph must contain edge list as dask-cudf dataframe with one partition per GPU.

Parameters
graphcugraph.DiGraph

cuGraph graph descriptor, should contain the connectivity information as dask cudf edge list dataframe. Undirected Graph not currently supported.

sourceInteger

Specify source vertex

Returns
dfdask_cudf.DataFrame

df[‘vertex’] gives the vertex id

df[‘distance’] gives the path distance from the starting vertex

df[‘predecessor’] gives the vertex id it was reached from in the traversal

Examples

>>> import cugraph.dask as dcg
>>> ... Init a DASK Cluster
>>    see https://docs.rapids.ai/api/cugraph/stable/dask-cugraph.html
>>> chunksize = dcg.get_chunksize(input_data_path)
>>> ddf = dask_cudf.read_csv(input_data_path, chunksize=chunksize,
                             delimiter=' ',
                             names=['src', 'dst', 'value'],
                             dtype=['int32', 'int32', 'float32'])
>>> dg = cugraph.DiGraph()
>>> dg.from_dask_cudf_edgelist(ddf, 'src', 'dst')
>>> df = dcg.sssp(dg, 0)

Traveling-salesperson-problem

cugraph.traversal.traveling_salesperson.traveling_salesperson(pos_list, restarts=100000, beam_search=True, k=4, nstart=None, verbose=False)[source]

Finds an approximate solution to the traveling salesperson problem (TSP). cuGraph computes an approximation of the TSP problem using hill climbing optimization.

The current implementation does not support a weighted graph.

Parameters
pos_list: cudf.DataFrame

Data frame with initial vertex positions containing three columns: ‘vertex’ ids and ‘x’, ‘y’ positions.

restarts: int

Number of starts to try. The more restarts, the better the solution will be approximated. The number of restarts depends on the problem size and should be kept low for instances above 2k cities.

beam_search: bool

Specify if the initial solution should use KNN for an approximation solution.

k: int

Beam width to use in the search.

nstart: int

Vertex id to use as starting position.

verbose: bool

Logs configuration and iterative improvement.

Returns
routecudf.Series

cudf.Series of size V containing the ordered list of vertices than needs to be visited.

Tree

Minimum Spanning Tree

cugraph.tree.minimum_spanning_tree.minimum_spanning_tree(G, weight=None, algorithm='boruvka', ignore_nan=False)[source]

Returns a minimum spanning tree (MST) or forest (MSF) on an undirected graph

Parameters
GcuGraph.Graph or networkx.Graph

cuGraph graph descriptor with connectivity information.

weightstring

default to the weights in the graph, if the graph edges do not have a weight attribute a default weight of 1 will be used.

algorithmstring

Default to ‘boruvka’. The parallel algorithm to use when finding a minimum spanning tree.

ignore_nanbool

Default to False

Returns
G_mstcuGraph.Graph or networkx.Graph

A graph descriptor with a minimum spanning tree or forest. The networkx graph will not have all attributes copied over

Maximum Spanning Tree

cugraph.tree.minimum_spanning_tree.maximum_spanning_tree(G, weight=None, algorithm='boruvka', ignore_nan=False)[source]

Returns a maximum spanning tree (MST) or forest (MSF) on an undirected graph

Parameters
GcuGraph.Graph or networkx.Graph

cuGraph graph descriptor with connectivity information.

weightstring

default to the weights in the graph, if the graph edges do not have a weight attribute a default weight of 1 will be used.

algorithmstring

Default to ‘boruvka’. The parallel algorithm to use when finding a maximum spanning tree.

ignore_nanbool

Default to False

Returns
G_mstcuGraph.Graph or networkx.Graph

A graph descriptor with a maximum spanning tree or forest. The networkx graph will not have all attributes copied over

Generator

RMAT

cugraph.generators.rmat(scale, num_edges, a, b, c, seed, clip_and_flip, scramble_vertex_ids, create_using=<class 'cugraph.structure.graph_classes.DiGraph'>, mg=False)[source]

Generate a Graph object using a Recursive MATrix (R-MAT) graph generation algorithm.

Parameters
scaleint

Scale factor to set the number of vertices in the graph Vertex IDs have values in [0, V), where V = 1 << ‘scale’

num_edgesint

Number of edges to generate

afloat

Probability of the first partition

bfloat

Probability of the second partition

cfloat

Probability of the thrid partition

seedint

Seed value for the random number generator

clip_and_flipbool

Flag controlling whether to generate edges only in the lower triangular part (including the diagonal) of the graph adjacency matrix (if set to ‘true’) or not (if set to ‘false).

scramble_vertex_idsbool

Flag controlling whether to scramble vertex ID bits (if set to true) or not (if set to false); scrambling vertex ID bits breaks correlation between vertex ID values and vertex degrees.

create_usingcugraph Graph type or None The graph type to construct

containing the generated edges and vertices. If None is specified, the edgelist cuDF DataFrame (or dask_cudf DataFrame for MG) is returned as-is. This is useful for benchmarking Graph construction steps that require raw data that includes potential self-loops, isolated vertices, and duplicated edges. Default is cugraph.DiGraph. NOTE: only the cugraph.DiGraph type is supported for multi-GPU

mgbool

If True, R-MAT generation occurs across multiple GPUs. If False, only a single GPU is used. Default is False (single-GPU)

Returns
instance of cugraph.Graph

Examples

import cugraph from cugraph.generators import rmat

df = rmat(

scale, (2**scale)*edgefactor, 0.1, 0.2, 0.3, seed or 42, clip_and_flip=False, scramble_vertex_ids=True, create_using=None, # return edgelist instead of Graph instance mg=False

)

DASK MG Helper functions

cugraph.comms.comms.initialize(comms=None, p2p=False, prows=None, pcols=None, partition_type=1)[source]

Initialize a communicator for multi-node/multi-gpu communications. It is expected to be called right after client initialization for running multi-GPU algorithms (this wraps raft comms that manages underlying NCCL and UCX comms handles across the workers of a Dask cluster).

It is recommended to also call destroy() when the comms are no longer needed so the underlying resources can be cleaned up.

Parameters
commsraft Comms

A pre-initialized raft communicator. If provided, this is used for mnmg communications. If not provided, default comms are initialized as per client information.

p2pbool

Initialize UCX endpoints if True. Default is False.

prowsint

Specifies the number of rows when performing a 2D partitioning of the input graph. If specified, this must be a factor of the total number of parallel processes. When specified with pcols, prows*pcols should be equal to the total number of parallel processes.

pcolsint

Specifies the number of columns when performing a 2D partitioning of the input graph. If specified, this must be a factor of the total number of parallel processes. When specified with prows, prows*pcols should be equal to the total number of parallel processes.

partition_typeint

Valid values are currently 1 or any int other than 1. A value of 1 (the default) represents a partitioning resulting in prows*pcols partitions. A non-1 value currently results in a partitioning of p*pcols partitions, where p is the number of GPUs.

cugraph.comms.comms.destroy()[source]

Shuts down initialized comms and cleans up resources.

cugraph.dask.common.read_utils.get_chunksize(input_path)[source]

Calculate the appropriate chunksize for dask_cudf.read_csv to get a number of partitions equal to the number of GPUs.

Examples

>>> import dask_cugraph.pagerank as dcg
>>> chunksize = dcg.get_chunksize(edge_list.csv)