cugraph.hypergraph#

cugraph.hypergraph(values, columns=None, dropna=True, direct=False, graph_class=<class 'cugraph.structure.graph_classes.Graph'>, categories={}, drop_edge_attrs=False, categorical_metadata=True, SKIP=None, EDGES=None, DELIM='::', SOURCE='src', TARGET='dst', WEIGHTS=None, NODEID='node_id', EVENTID='event_id', ATTRIBID='attrib_id', CATEGORY='category', NODETYPE='node_type', EDGETYPE='edge_type')[source]#

Creates a hypergraph out of the given dataframe, returning the graph components as dataframes. The transform reveals relationships between the rows and unique values. This transform is useful for lists of events, samples, relationships, and other structured high-dimensional data. The transform creates a node for every row, and turns a row’s column entries into node attributes. If direct=False (default), every unique value within a column is also turned into a node. Edges are added to connect a row’s nodes to each of its column nodes, or if direct=True, to one another. Nodes are given the attribute specified by NODETYPE that corresponds to the originating column name, or if a row EVENTID. Consider a list of events. Each row represents a distinct event, and each column some metadata about an event. If multiple events have common metadata, they will be transitively connected through those metadata values. Conversely, if an event has unique metadata, the unique metadata will turn into nodes that only have connections to the event node. For best results, set EVENTID to a row’s unique ID, SKIP to all non-categorical columns (or columns to all categorical columns), and categories to group columns with the same kinds of values.

Parameters:
valuescudf.DataFrame

The input Dataframe to transform into a hypergraph.

columnssequence, optional (default=None)

An optional sequence of column names to process.

dropnabool, optional (default=True)

If True, do not include “null” values in the graph.

directbool, optional (default=False)

If True, omit hypernodes and instead strongly connect nodes for each row with each other.

graph_classcugraph.Graph, optional (default=cugraph.Graph)

Specify the type of Graph to create.

categoriesdict, optional (default=dict())

Dictionary mapping column names to distinct categories. If the same value appears columns mapped to the same category, the transform will generate one node for it, instead of one for each column.

drop_edge_attrsbool, optional, (default=False)

If True, exclude each row’s attributes from its edges

categorical_metadatabool, optional (default=True)

Whether to use cudf.CategoricalDtype for the CATEGORY, NODETYPE, and EDGETYPE columns. These columns are typically large string columns with with low cardinality, and using categorical dtypes can save a significant amount of memory.

SKIPsequence, optional

A sequence of column names not to transform into nodes.

EDGESdict, optional

When direct=True, select column pairs instead of making all edges.

DELIMstr, optional (default=”::”)

The delimiter to use when joining column names, categories, and ids.

SOURCEstr, optional (default=”src”)

The name to use as the source column in the graph and edge DF.

TARGETstr, optional (default=”dst”)

The name to use as the target column in the graph and edge DF.

WEIGHTSstr, optional (default=None)

The column name from the input DF to map as the graph’s edge weights.

NODEIDstr, optional (default=”node_id”)

The name to use as the node id column in the graph and node DFs.

EVENTIDstr, optional (default=”event_id”)

The name to use as the event id column in the graph and node DFs.

ATTRIBIDstr, optional (default=”attrib_id”)

The name to use as the attribute id column in the graph and node DFs.

CATEGORYstr, optional (default “category”)

The name to use as the category column in the graph and DFs.

NODETYPEstr, optional (default=”node_type”)

The name to use as the node type column in the graph and node DFs.

EDGETYPEstr, optional (default=”edge_type”)

The name to use as the edge type column in the graph and edge DF.

Returns:
resultdict {“nodes”, “edges”, “graph”, “events”, “entities”}
nodescudf.DataFrame

A DataFrame of found entity and hyper node attributes.

edgescudf.DataFrame

A DataFrame of edge attributes.

graphcugraph.Graph

A Graph of the found entity nodes, hyper nodes, and edges.

eventscudf.DataFrame

If direct=True, a DataFrame of hyper node attributes, else empty.

entitiescudf.DataFrame

A DataFrame of the found entity node attributes.

Examples

>>> M = cudf.read_csv(datasets_path / 'karate.csv', delimiter=' ',
...                   names=['src', 'dst', 'weights'],
...                   dtype=['int32', 'int32', 'float32'], header=None)
>>> nodes, edges, G, events, entities = cugraph.hypergraph(M)