cugraph.symmetrize#

cugraph.symmetrize(input_df, source_col_name, dest_col_name, value_col_name=None, multi=False, symmetrize=True, do_expensive_check=False)[source]#

Take a dataframe of source destination pairs along with associated values stored in a single GPU or distributed create a COO set of source destination pairs along with values where all edges exist in both directions.

Return from this call will be a COO stored as two/three cudf/dask_cudf Series/Dataframe -the symmetrized source column and the symmetrized dest column, along with an optional cudf/dask_cudf Series/DataFrame containing the associated values (only if the values are passed in).

Parameters:

input_dfcudf.DataFrame or dask_cudf.DataFrame

The edgelist as a cudf.DataFrame or dask_cudf.DataFrame

source_col_namestr or list

source column name.

dest_col_namestr or list

destination column name.

value_col_namestr or None

weights column name.

multibool, optional (default=False)

[Deprecated, Multi will be removed in future version, and the removal of multi edges will no longer be supported from ‘symmetrize’. Multi edges will be removed upon creation of graph instance directly based on if the graph is curgaph.MultiGraph or cugraph.Graph.]

Set to True if graph is a Multi(Di)Graph. This allows multiple edges instead of dropping them.

symmetrizebool, optional

Default is True to perform symmetrization. If False only duplicate edges are dropped.

Examples

>>> from cugraph.structure.symmetrize import symmetrize
>>> # Download dataset from https://github.com/rapidsai/cugraph/datasets/..
>>> M = cudf.read_csv(datasets_path / 'karate.csv', delimiter=' ',
...                   dtype=['int32', 'int32', 'float32'], header=None)
>>> df = cudf.DataFrame()
>>> df['sources'] = cudf.Series(M['0'])
>>> df['destinations'] = cudf.Series(M['1'])
>>> df['values'] = cudf.Series(M['2'])
>>> src, dst, val = symmetrize(df, 'sources', 'destinations', 'values', multi=True)