cugraph_dgl.cugraph_storage.CuGraphStorage#

class cugraph_dgl.cugraph_storage.CuGraphStorage(data_dict: Dict[Tuple[str, str, str], Union[DataFrame, DataFrame]], num_nodes_dict: Dict[str, int], single_gpu: bool = True, device_id: int = 0, idtype=None)[source]#

Duck-typed version of the DGLHeteroGraph class made for cuGraph for storing graph structure and node/edge feature data.

This object is wrapper around cugraph’s Multi GPU MultiGraph and returns samples that conform with DGLHeteroGraph See: https://docs.rapids.ai/api/cugraph/nightly/api_docs/cugraph_dgl.html

Attributes:
canonical_etypes
device

Get the device of the graph.

etypes

Return all the edge type names in the graph.

ntypes

Return all the node type names in the graph.

num_canonical_edges_dict
total_number_of_edges
total_number_of_nodes

Methods

add_edge_data(feat_obj, canonical_etype, ...)

Add edge features

add_node_data(feat_obj, ntype, feat_name)

Add node features

edge_subgraph(edges[, relabel_nodes, ...])

Return a subgraph induced on given edges. This has the same semantics as dgl.edge_subgraph. Parameters ---------- edges : edges or dict[(str, str, str), edges] The edges to form the subgraph. The allowed edges formats are: * Int Tensor: Each element is an edge ID. The tensor must have the same device type and ID data type as the graph's. * iterable[int]: Each element is an edge ID. * Bool Tensor: Each \(i^{th}\) element is a bool flag indicating whether edge \(i\) is in the subgraph. If the graph is homogeneous, one can directly pass the above formats. Otherwise, the argument must be a dictionary with keys being edge types and values being the edge IDs in the above formats relabel_nodes : bool, optional If True, the extracted subgraph will only have the nodes in the specified node set and it will relabel the nodes in order. output_device : Framework-specific device context object, optional The output device. Default is the same as the input graph. Returns ------- DGLGraph The subgraph.

find_edges(eid[, etype, output_device])

Return the source and destination node ID(s) given the edge ID(s).

get_edge_id_offset(canonical_etype)

Return the integer offset for node id of type etype

get_edge_storage(key[, etype])

Get storage object of edge feature of type ntype and name key

get_node_id_offset(ntype)

Return the integer offset for node id of type ntype

get_node_storage(key[, ntype])

Get storage object of node feature of type ntype and name key

global_uniform_negative_sampling(num_samples)

Per source negative sampling as in dgl.dataloading.GlobalUniform

num_edges([etype])

Return the number of edges in the graph.

num_nodes([ntype])

Return the number of nodes in the graph. Parameters ---------- ntype : str, optional The node type name. If given, it returns the number of nodes of the type. If not given (default), it returns the total number of nodes of all types.

number_of_nodes([ntype])

Return the number of nodes in the graph. Alias of num_nodes Parameters ---------- ntype : str, optional The node type name. If given, it returns the number of nodes of the type. If not given (default), it returns the total number of nodes of all types.

sample_neighbors(nodes, fanout[, edge_dir, ...])

Return a DGLGraph which is a subgraph induced by sampling neighboring edges of the given nodes. See dgl.sampling.sample_neighbors for detailed semantics. Parameters ---------- nodes : Tensor or dict[str, Tensor] Node IDs to sample neighbors from. This argument can take a single ID tensor or a dictionary of node types and ID tensors. If a single tensor is given, the graph must only have one type of nodes. fanout : int or dict[etype, int] The number of edges to be sampled for each node on each edge type. This argument can take a single int or a dictionary of edge types and ints. If a single int is given, DGL will sample this number of edges for each node for every edge type. If -1 is given for a single edge type, all the neighboring edges with that edge type will be selected. edge_dir: 'in' or 'out' The direction of edges to import prob : str, optional Feature name used as the (un-normalized) probabilities associated with each neighboring edge of a node. The feature must have only one element for each edge. The features must be non-negative floats, and the sum of the features of inbound/outbound edges for every node must be positive (though they don't have to sum up to one). Otherwise, the result will be undefined. If prob is not None, GPU sampling is not supported. exclude_edges: tensor or dict Edge IDs to exclude during sampling neighbors for the seed nodes. This argument can take a single ID tensor or a dictionary of edge types and ID tensors. If a single tensor is given, the graph must only have one type of nodes. replace : bool, optional If True, sample with replacement. output_device : Framework-specific device context object, optional The output device. Default is the same as the input graph. Returns ------- DGLGraph A sampled subgraph with the same nodes as the original graph, but only the sampled neighboring edges. The induced edge IDs will be in edata[dgl.EID].

subgraph(nodes[, relabel_nodes, output_device])

Return a subgraph induced on given nodes. This has the same semantics as dgl.node_subgraph. Parameters ---------- nodes : nodes or dict[str, nodes] The nodes to form the subgraph. The allowed nodes formats are: * Int Tensor: Each element is a node ID. The tensor must have the same device type and ID data type as the graph's. * iterable[int]: Each element is a node ID. * Bool Tensor: Each \(i^{th}\) element is a bool flag indicating whether node \(i\) is in the subgraph. If the graph is homogeneous, directly pass the above formats. Otherwise, the argument must be a dictionary with keys being node types and values being the node IDs in the above formats. relabel_nodes : bool, optional If True, the extracted subgraph will only have the nodes in the specified node set and it will relabel the nodes in order. output_device : Framework-specific device context object, optional The output device. Default is the same as the input graph. Returns ------- DGLGraph The subgraph.

cugraph_e_id_to_dgl_id

cugraph_n_id_to_dgl_id

dgl_e_id_to_cugraph_id

dgl_n_id_to_cugraph_id

get_corresponding_canonical_etype

__init__(data_dict: Dict[Tuple[str, str, str], Union[DataFrame, DataFrame]], num_nodes_dict: Dict[str, int], single_gpu: bool = True, device_id: int = 0, idtype=None)[source]#

Constructor for creating a object of instance CuGraphStorage

See also cugraph_dgl.cugraph_storage_from_heterograph to convert from DGLHeteroGraph to CuGraphStorage

Parameters:
data_dict:

The dictionary data for constructing a heterogeneous graph. The keys are in the form of string triplets (src_type, edge_type, dst_type), specifying the source node, edge, and destination node types. The values are graph data is a dataframe with 2 columns form of (𝑈,𝑉), where (𝑈[𝑖],𝑉[𝑖]) forms the edge with ID 𝑖.

num_nodes_dict: dict[str, int]

The number of nodes for some node types, which is a dictionary mapping a node type T to the number of T-typed nodes.

single_gpu: bool

Whether to create the cugraph Property Graph on a single GPU or multiple GPUs single GPU = True single GPU = False

device_id: int

If specified, must be the integer ID of the GPU device to have the results being created on

idtype: Framework-specific device object,

The data type for storing the structure-related graph information this can be torch.int32 or torch.int64 for PyTorch. Defaults to torch.int64 if pytorch is installed

Examples

The following example uses CuGraphStorage :
>>> from cugraph_dgl.cugraph_storage import CuGraphStorage
>>> import cudf
>>> import torch
>>> num_nodes_dict={"drug": 3, "gene": 2, "disease": 1}
>>> drug_interacts_drug_df = cudf.DataFrame({"src": [0, 1], "dst": [1, 2]})
>>> drug_interacts_gene = cudf.DataFrame({"src": [0, 1], "dst": [0, 1]})
>>> drug_treats_disease = cudf.DataFrame({"src": [1], "dst": [0]})
>>> data_dict = {("drug", "interacts", "drug"):drug_interacts_drug_df,
     ("drug", "interacts", "gene"):drug_interacts_gene,
     ("drug", "treats", "disease"):drug_treats_disease }
>>> gs = CuGraphStorage(data_dict=data_dict, num_nodes_dict=num_nodes_dict)
>>> gs.add_node_data(ntype='drug', feat_name='node_feat',
                              feat_obj=torch.as_tensor([0.1, 0.2, 0.3]))
>>> gs.add_edge_data(canonical_etype=("drug", "interacts", "drug"),
                              feat_name='edge_feat',
                              feat_obj=torch.as_tensor([0.2, 0.4]))
>>> gs.ntypes
['disease', 'drug', 'gene']
>>> gs.etypes
['interacts', 'interacts', 'treats']
>>> gs.canonical_etypes
[('drug', 'interacts', 'drug'),
 ('drug', 'interacts', 'gene'),
 ('drug', 'treats', 'disease')]
>>> gs.sample_neighbors({'disease':[0]},
                        1)
Graph(num_nodes={'disease': 1, 'drug': 3, 'gene': 2},
num_edges={('drug', 'interacts', 'drug'): 0,
           ('drug', 'interacts', 'gene'): 0,
           ('drug', 'treats', 'disease'): 1},
metagraph=[('drug', 'drug', 'interacts'),
           ('drug', 'gene', 'interacts'),
           ('drug', 'disease', 'treats')])
>>> gs.get_node_storage(key='node_feat',
                        ntype='drug').fetch([0,1,2])
tensor([0.1000, 0.2000, 0.3000], device='cuda:0',
 dtype=torch.float64)
>>> es = gs.get_edge_storage(key='edge_feat',
                        etype=('drug', 'interacts', 'drug'))
>>> es.fetch([0,1])
tensor([0.2000, 0.4000], device='cuda:0', dtype=torch.float64)

Methods

__init__(data_dict, num_nodes_dict[, ...])

Constructor for creating a object of instance CuGraphStorage

add_edge_data(feat_obj, canonical_etype, ...)

Add edge features

add_node_data(feat_obj, ntype, feat_name)

Add node features

cugraph_e_id_to_dgl_id(index_t, canonical_etype)

cugraph_n_id_to_dgl_id(index_t, ntype)

dgl_e_id_to_cugraph_id(index_t, canonical_etype)

dgl_n_id_to_cugraph_id(index_t, ntype)

edge_subgraph(edges[, relabel_nodes, ...])

Return a subgraph induced on given edges. This has the same semantics as dgl.edge_subgraph. Parameters ---------- edges : edges or dict[(str, str, str), edges] The edges to form the subgraph. The allowed edges formats are: * Int Tensor: Each element is an edge ID. The tensor must have the same device type and ID data type as the graph's. * iterable[int]: Each element is an edge ID. * Bool Tensor: Each \(i^{th}\) element is a bool flag indicating whether edge \(i\) is in the subgraph. If the graph is homogeneous, one can directly pass the above formats. Otherwise, the argument must be a dictionary with keys being edge types and values being the edge IDs in the above formats relabel_nodes : bool, optional If True, the extracted subgraph will only have the nodes in the specified node set and it will relabel the nodes in order. output_device : Framework-specific device context object, optional The output device. Default is the same as the input graph. Returns ------- DGLGraph The subgraph.

find_edges(eid[, etype, output_device])

Return the source and destination node ID(s) given the edge ID(s).

get_corresponding_canonical_etype(etype)

get_edge_id_offset(canonical_etype)

Return the integer offset for node id of type etype

get_edge_storage(key[, etype])

Get storage object of edge feature of type ntype and name key

get_node_id_offset(ntype)

Return the integer offset for node id of type ntype

get_node_storage(key[, ntype])

Get storage object of node feature of type ntype and name key

global_uniform_negative_sampling(num_samples)

Per source negative sampling as in dgl.dataloading.GlobalUniform

num_edges([etype])

Return the number of edges in the graph.

num_nodes([ntype])

Return the number of nodes in the graph. Parameters ---------- ntype : str, optional The node type name. If given, it returns the number of nodes of the type. If not given (default), it returns the total number of nodes of all types.

number_of_nodes([ntype])

Return the number of nodes in the graph. Alias of num_nodes Parameters ---------- ntype : str, optional The node type name. If given, it returns the number of nodes of the type. If not given (default), it returns the total number of nodes of all types.

sample_neighbors(nodes, fanout[, edge_dir, ...])

Return a DGLGraph which is a subgraph induced by sampling neighboring edges of the given nodes. See dgl.sampling.sample_neighbors for detailed semantics. Parameters ---------- nodes : Tensor or dict[str, Tensor] Node IDs to sample neighbors from. This argument can take a single ID tensor or a dictionary of node types and ID tensors. If a single tensor is given, the graph must only have one type of nodes. fanout : int or dict[etype, int] The number of edges to be sampled for each node on each edge type. This argument can take a single int or a dictionary of edge types and ints. If a single int is given, DGL will sample this number of edges for each node for every edge type. If -1 is given for a single edge type, all the neighboring edges with that edge type will be selected. edge_dir: 'in' or 'out' The direction of edges to import prob : str, optional Feature name used as the (un-normalized) probabilities associated with each neighboring edge of a node. The feature must have only one element for each edge. The features must be non-negative floats, and the sum of the features of inbound/outbound edges for every node must be positive (though they don't have to sum up to one). Otherwise, the result will be undefined. If prob is not None, GPU sampling is not supported. exclude_edges: tensor or dict Edge IDs to exclude during sampling neighbors for the seed nodes. This argument can take a single ID tensor or a dictionary of edge types and ID tensors. If a single tensor is given, the graph must only have one type of nodes. replace : bool, optional If True, sample with replacement. output_device : Framework-specific device context object, optional The output device. Default is the same as the input graph. Returns ------- DGLGraph A sampled subgraph with the same nodes as the original graph, but only the sampled neighboring edges. The induced edge IDs will be in edata[dgl.EID].

subgraph(nodes[, relabel_nodes, output_device])

Return a subgraph induced on given nodes. This has the same semantics as dgl.node_subgraph. Parameters ---------- nodes : nodes or dict[str, nodes] The nodes to form the subgraph. The allowed nodes formats are: * Int Tensor: Each element is a node ID. The tensor must have the same device type and ID data type as the graph's. * iterable[int]: Each element is a node ID. * Bool Tensor: Each \(i^{th}\) element is a bool flag indicating whether node \(i\) is in the subgraph. If the graph is homogeneous, directly pass the above formats. Otherwise, the argument must be a dictionary with keys being node types and values being the node IDs in the above formats. relabel_nodes : bool, optional If True, the extracted subgraph will only have the nodes in the specified node set and it will relabel the nodes in order. output_device : Framework-specific device context object, optional The output device. Default is the same as the input graph. Returns ------- DGLGraph The subgraph.

Attributes

canonical_etypes

device

Get the device of the graph. Returns ------- device context The device of the graph, which should be a framework-specific device object (e.g., torch.device).

etypes

Return all the edge type names in the graph.

ntypes

Return all the node type names in the graph.

num_canonical_edges_dict

total_number_of_edges

total_number_of_nodes