cugraph_dgl.cugraph_storage.CuGraphStorage#

class cugraph_dgl.cugraph_storage.CuGraphStorage(data_dict: Dict[Tuple[str, str, str], Union[DataFrame, DataFrame]], num_nodes_dict: Dict[str, int], single_gpu: bool = True, device_id: int = 0, idtype=torch.int64)[source]#

Duck-typed version of the DGLHeteroGraph class made for cuGraph for storing graph structure and node/edge feature data.

This object is wrapper around cugraph’s Multi GPU MultiGraph and returns samples that conform with DGLHeteroGraph See: https://docs.rapids.ai/api/cugraph/nightly/api_docs/cugraph_dgl.html

Attributes:

canonical_etypes
device: Get the device of the graph.
etypes: Return all the edge type names in the graph.
ntypes: Return all the node type names in the graph.
num_canonical_edges_dict
total_number_of_edges
total_number_of_nodes

Methods

`add_edge_data`(feat_obj, canonical_etype, ...)	Add edge features
`add_node_data`(feat_obj, ntype, feat_name)	Add node features
`edge_subgraph`(edges[, relabel_nodes, ...])	Return a subgraph induced on given edges. This has the same semantics as `dgl.edge_subgraph`. Parameters ---------- edges : edges or dict[(str, str, str), edges] The edges to form the subgraph. The allowed edges formats are: * Int Tensor: Each element is an edge ID. The tensor must have the same device type and ID data type as the graph's. * iterable[int]: Each element is an edge ID. * Bool Tensor: Each \(i^{th}\) element is a bool flag indicating whether edge \(i\) is in the subgraph. If the graph is homogeneous, one can directly pass the above formats. Otherwise, the argument must be a dictionary with keys being edge types and values being the edge IDs in the above formats relabel_nodes : bool, optional If True, the extracted subgraph will only have the nodes in the specified node set and it will relabel the nodes in order. output_device : Framework-specific device context object, optional The output device. Default is the same as the input graph. Returns ------- DGLGraph The subgraph.
`find_edges`(eid[, etype, output_device])	Return the source and destination node ID(s) given the edge ID(s).
`get_edge_id_offset`(canonical_etype)	Return the integer offset for node id of type etype
`get_edge_storage`(key[, etype])	Get storage object of edge feature of type `ntype` and name `key`
`get_node_id_offset`(ntype)	Return the integer offset for node id of type ntype
`get_node_storage`(key[, ntype])	Get storage object of node feature of type `ntype` and name `key`
`global_uniform_negative_sampling`(num_samples)	Per source negative sampling as in `dgl.dataloading.GlobalUniform`
`num_edges`([etype])	Return the number of edges in the graph.
`num_nodes`([ntype])	Return the number of nodes in the graph. Parameters ---------- ntype : str, optional The node type name. If given, it returns the number of nodes of the type. If not given (default), it returns the total number of nodes of all types.
`number_of_nodes`([ntype])	Return the number of nodes in the graph. Alias of `num_nodes` Parameters ---------- ntype : str, optional The node type name. If given, it returns the number of nodes of the type. If not given (default), it returns the total number of nodes of all types.
`sample_neighbors`(nodes, fanout[, edge_dir, ...])	Return a DGLGraph which is a subgraph induced by sampling neighboring edges of the given nodes. See `dgl.sampling.sample_neighbors` for detailed semantics. Parameters ---------- nodes : Tensor or dict[str, Tensor] Node IDs to sample neighbors from. This argument can take a single ID tensor or a dictionary of node types and ID tensors. If a single tensor is given, the graph must only have one type of nodes. fanout : int or dict[etype, int] The number of edges to be sampled for each node on each edge type. This argument can take a single int or a dictionary of edge types and ints. If a single int is given, DGL will sample this number of edges for each node for every edge type. If -1 is given for a single edge type, all the neighboring edges with that edge type will be selected. edge_dir: 'in' or 'out' The direction of edges to import prob : str, optional Feature name used as the (un-normalized) probabilities associated with each neighboring edge of a node. The feature must have only one element for each edge. The features must be non-negative floats, and the sum of the features of inbound/outbound edges for every node must be positive (though they don't have to sum up to one). Otherwise, the result will be undefined. If `prob` is not None, GPU sampling is not supported. exclude_edges: tensor or dict Edge IDs to exclude during sampling neighbors for the seed nodes. This argument can take a single ID tensor or a dictionary of edge types and ID tensors. If a single tensor is given, the graph must only have one type of nodes. replace : bool, optional If True, sample with replacement. output_device : Framework-specific device context object, optional The output device. Default is the same as the input graph. Returns ------- DGLGraph A sampled subgraph with the same nodes as the original graph, but only the sampled neighboring edges. The induced edge IDs will be in `edata[dgl.EID]`.
`subgraph`(nodes[, relabel_nodes, output_device])	Return a subgraph induced on given nodes. This has the same semantics as `dgl.node_subgraph`. Parameters ---------- nodes : nodes or dict[str, nodes] The nodes to form the subgraph. The allowed nodes formats are: * Int Tensor: Each element is a node ID. The tensor must have the same device type and ID data type as the graph's. * iterable[int]: Each element is a node ID. * Bool Tensor: Each \(i^{th}\) element is a bool flag indicating whether node \(i\) is in the subgraph. If the graph is homogeneous, directly pass the above formats. Otherwise, the argument must be a dictionary with keys being node types and values being the node IDs in the above formats. relabel_nodes : bool, optional If True, the extracted subgraph will only have the nodes in the specified node set and it will relabel the nodes in order. output_device : Framework-specific device context object, optional The output device. Default is the same as the input graph. Returns ------- DGLGraph The subgraph.

cugraph_e_id_to_dgl_id
cugraph_n_id_to_dgl_id
dgl_e_id_to_cugraph_id
dgl_n_id_to_cugraph_id
get_corresponding_canonical_etype

__init__(data_dict: Dict[Tuple[str, str, str], Union[DataFrame, DataFrame]], num_nodes_dict: Dict[str, int], single_gpu: bool = True, device_id: int = 0, idtype=torch.int64)[source]#

Constructor for creating a object of instance CuGraphStorage

See also cugraph_dgl.cugraph_storage_from_heterograph to convert from DGLHeteroGraph to CuGraphStorage

Parameters:

data_dict:

The dictionary data for constructing a heterogeneous graph. The keys are in the form of string triplets (src_type, edge_type, dst_type), specifying the source node, edge, and destination node types. The values are graph data is a dataframe with 2 columns form of (𝑈,𝑉), where (𝑈[𝑖],𝑉[𝑖]) forms the edge with ID 𝑖.

num_nodes_dict: dict[str, int]: The number of nodes for some node types, which is a dictionary mapping a node type T to the number of T-typed nodes.

single_gpu: bool

Whether to create the cugraph Property Graph on a single GPU or multiple GPUs single GPU = True single GPU = False

device_id: int

If specified, must be the integer ID of the GPU device to have the results being created on

idtype: Framework-specific device object,

The data type for storing the structure-related graph information this can be torch.int32 or torch.int64 for PyTorch. Defaults to torch.int64 if pytorch is installed

Examples

The following example uses CuGraphStorage :

>>> from cugraph_dgl.cugraph_storage import CuGraphStorage
>>> import cudf
>>> import torch
>>> num_nodes_dict={"drug": 3, "gene": 2, "disease": 1}
>>> drug_interacts_drug_df = cudf.DataFrame({"src": [0, 1], "dst": [1, 2]})
>>> drug_interacts_gene = cudf.DataFrame({"src": [0, 1], "dst": [0, 1]})
>>> drug_treats_disease = cudf.DataFrame({"src": [1], "dst": [0]})
>>> data_dict = {("drug", "interacts", "drug"):drug_interacts_drug_df,
     ("drug", "interacts", "gene"):drug_interacts_gene,
     ("drug", "treats", "disease"):drug_treats_disease }
>>> gs = CuGraphStorage(data_dict=data_dict, num_nodes_dict=num_nodes_dict)
>>> gs.add_node_data(ntype='drug', feat_name='node_feat',
                              feat_obj=torch.as_tensor([0.1, 0.2, 0.3]))
>>> gs.add_edge_data(canonical_etype=("drug", "interacts", "drug"),
                              feat_name='edge_feat',
                              feat_obj=torch.as_tensor([0.2, 0.4]))
>>> gs.ntypes
['disease', 'drug', 'gene']
>>> gs.etypes
['interacts', 'interacts', 'treats']
>>> gs.canonical_etypes
[('drug', 'interacts', 'drug'),
 ('drug', 'interacts', 'gene'),
 ('drug', 'treats', 'disease')]

>>> gs.sample_neighbors({'disease':[0]},
                        1)
Graph(num_nodes={'disease': 1, 'drug': 3, 'gene': 2},
num_edges={('drug', 'interacts', 'drug'): 0,
           ('drug', 'interacts', 'gene'): 0,
           ('drug', 'treats', 'disease'): 1},
metagraph=[('drug', 'drug', 'interacts'),
           ('drug', 'gene', 'interacts'),
           ('drug', 'disease', 'treats')])

>>> gs.get_node_storage(key='node_feat',
                        ntype='drug').fetch([0,1,2])
tensor([0.1000, 0.2000, 0.3000], device='cuda:0',
 dtype=torch.float64)

>>> es = gs.get_edge_storage(key='edge_feat',
                        etype=('drug', 'interacts', 'drug'))
>>> es.fetch([0,1])
tensor([0.2000, 0.4000], device='cuda:0', dtype=torch.float64)

Methods

`__init__`(data_dict, num_nodes_dict[, ...])	Constructor for creating a object of instance CuGraphStorage
`add_edge_data`(feat_obj, canonical_etype, ...)	Add edge features
`add_node_data`(feat_obj, ntype, feat_name)	Add node features
`cugraph_e_id_to_dgl_id`(index_t, canonical_etype)
`cugraph_n_id_to_dgl_id`(index_t, ntype)
`dgl_e_id_to_cugraph_id`(index_t, canonical_etype)
`dgl_n_id_to_cugraph_id`(index_t, ntype)
`edge_subgraph`(edges[, relabel_nodes, ...])	Return a subgraph induced on given edges. This has the same semantics as `dgl.edge_subgraph`. Parameters ---------- edges : edges or dict[(str, str, str), edges] The edges to form the subgraph. The allowed edges formats are: * Int Tensor: Each element is an edge ID. The tensor must have the same device type and ID data type as the graph's. * iterable[int]: Each element is an edge ID. * Bool Tensor: Each \(i^{th}\) element is a bool flag indicating whether edge \(i\) is in the subgraph. If the graph is homogeneous, one can directly pass the above formats. Otherwise, the argument must be a dictionary with keys being edge types and values being the edge IDs in the above formats relabel_nodes : bool, optional If True, the extracted subgraph will only have the nodes in the specified node set and it will relabel the nodes in order. output_device : Framework-specific device context object, optional The output device. Default is the same as the input graph. Returns ------- DGLGraph The subgraph.
`find_edges`(eid[, etype, output_device])	Return the source and destination node ID(s) given the edge ID(s).
`get_corresponding_canonical_etype`(etype)
`get_edge_id_offset`(canonical_etype)	Return the integer offset for node id of type etype
`get_edge_storage`(key[, etype])	Get storage object of edge feature of type `ntype` and name `key`
`get_node_id_offset`(ntype)	Return the integer offset for node id of type ntype
`get_node_storage`(key[, ntype])	Get storage object of node feature of type `ntype` and name `key`
`global_uniform_negative_sampling`(num_samples)	Per source negative sampling as in `dgl.dataloading.GlobalUniform`
`num_edges`([etype])	Return the number of edges in the graph.
`num_nodes`([ntype])	Return the number of nodes in the graph. Parameters ---------- ntype : str, optional The node type name. If given, it returns the number of nodes of the type. If not given (default), it returns the total number of nodes of all types.
`number_of_nodes`([ntype])	Return the number of nodes in the graph. Alias of `num_nodes` Parameters ---------- ntype : str, optional The node type name. If given, it returns the number of nodes of the type. If not given (default), it returns the total number of nodes of all types.
`sample_neighbors`(nodes, fanout[, edge_dir, ...])	Return a DGLGraph which is a subgraph induced by sampling neighboring edges of the given nodes. See `dgl.sampling.sample_neighbors` for detailed semantics. Parameters ---------- nodes : Tensor or dict[str, Tensor] Node IDs to sample neighbors from. This argument can take a single ID tensor or a dictionary of node types and ID tensors. If a single tensor is given, the graph must only have one type of nodes. fanout : int or dict[etype, int] The number of edges to be sampled for each node on each edge type. This argument can take a single int or a dictionary of edge types and ints. If a single int is given, DGL will sample this number of edges for each node for every edge type. If -1 is given for a single edge type, all the neighboring edges with that edge type will be selected. edge_dir: 'in' or 'out' The direction of edges to import prob : str, optional Feature name used as the (un-normalized) probabilities associated with each neighboring edge of a node. The feature must have only one element for each edge. The features must be non-negative floats, and the sum of the features of inbound/outbound edges for every node must be positive (though they don't have to sum up to one). Otherwise, the result will be undefined. If `prob` is not None, GPU sampling is not supported. exclude_edges: tensor or dict Edge IDs to exclude during sampling neighbors for the seed nodes. This argument can take a single ID tensor or a dictionary of edge types and ID tensors. If a single tensor is given, the graph must only have one type of nodes. replace : bool, optional If True, sample with replacement. output_device : Framework-specific device context object, optional The output device. Default is the same as the input graph. Returns ------- DGLGraph A sampled subgraph with the same nodes as the original graph, but only the sampled neighboring edges. The induced edge IDs will be in `edata[dgl.EID]`.
`subgraph`(nodes[, relabel_nodes, output_device])	Return a subgraph induced on given nodes. This has the same semantics as `dgl.node_subgraph`. Parameters ---------- nodes : nodes or dict[str, nodes] The nodes to form the subgraph. The allowed nodes formats are: * Int Tensor: Each element is a node ID. The tensor must have the same device type and ID data type as the graph's. * iterable[int]: Each element is a node ID. * Bool Tensor: Each \(i^{th}\) element is a bool flag indicating whether node \(i\) is in the subgraph. If the graph is homogeneous, directly pass the above formats. Otherwise, the argument must be a dictionary with keys being node types and values being the node IDs in the above formats. relabel_nodes : bool, optional If True, the extracted subgraph will only have the nodes in the specified node set and it will relabel the nodes in order. output_device : Framework-specific device context object, optional The output device. Default is the same as the input graph. Returns ------- DGLGraph The subgraph.

Attributes

`canonical_etypes`
`device`	Get the device of the graph. Returns ------- device context The device of the graph, which should be a framework-specific device object (e.g., `torch.device`).
`etypes`	Return all the edge type names in the graph.
`ntypes`	Return all the node type names in the graph.
`num_canonical_edges_dict`
`total_number_of_edges`
`total_number_of_nodes`