cugraph.symmetrize_ddf#

cugraph.symmetrize_ddf(ddf, src_name, dst_name, weight_name=None, multi=False, symmetrize=True)[source]#

Take a COO stored in a distributed DataFrame, and the column names of the source and destination columns and create a new data frame using the same column names that symmetrize the graph so that all edges appear in both directions.

Note that if other columns exist in the data frame (e.g. edge weights) the other columns will also be replicated. That is, if (u,v,data) represents the source value (u), destination value (v) and some set of other columns (data) in the input data, then the output data will contain both (u,v,data) and (v,u,data) with matching data.

If (u,v,data1) and (v,u,data2) exist in the input data where data1 != data2 then this code will arbitrarily pick the smaller data element to keep, if this is not desired then the caller should correct the data prior to calling symmetrize.

Parameters:
ddfdask_cudf.DataFrame

Input data frame containing COO. Columns should contain source ids, destination ids and any properties associated with the edges.

src_namestr or list

Name(s) of the column(s) in the data frame containing the source ids

dst_namestr or list

Name(s) of the column(s) in the data frame containing the destination ids

weight_namestring, optional (default=None)

Name of the column in the data frame containing the weight ids

multibool, optional (default=False)

[Deprecated, Multi will be removed in future version, and the removal of multi edges will no longer be supported from ‘symmetrize’. Multi edges will be removed upon creation of graph instance directly based on if the graph is curgaph.MultiGraph or cugraph.Graph.]

Set to True if graph is a Multi(Di)Graph. This allows multiple edges instead of dropping them.

symmetrizebool, optional (default=True)

Default is True to perform symmetrization. If False only duplicate edges are dropped.

Examples

>>> # import cugraph.dask as dcg
>>> # from cugraph.structure.symmetrize import symmetrize_ddf
>>> # Init a DASK Cluster
>>> # Download dataset from https://github.com/rapidsai/cugraph/datasets/..
>>> # chunksize = dcg.get_chunksize(datasets / 'karate.csv')
>>> # ddf = dask_cudf.read_csv(datasets/'karate.csv', blocksize=chunksize,
>>> #                          delimiter=' ',
>>> #                          names=['src', 'dst', 'weight'],
>>> #                          dtype=['int32', 'int32', 'float32'])
>>> # sym_ddf = symmetrize_ddf(ddf, "src", "dst", "weight")