cugraph_dgl.cugraph_storage.CuGraphStorage#
- class cugraph_dgl.cugraph_storage.CuGraphStorage(data_dict: Dict[Tuple[str, str, str], Union[DataFrame, DataFrame]], num_nodes_dict: Dict[str, int], single_gpu: bool = True, device_id: int = 0, idtype=torch.int64)[source]#
Duck-typed version of the DGLHeteroGraph class made for cuGraph for storing graph structure and node/edge feature data.
This object is wrapper around cugraph’s Multi GPU MultiGraph and returns samples that conform with DGLHeteroGraph See: https://docs.rapids.ai/api/cugraph/nightly/api_docs/cugraph_dgl.html
- Attributes:
- canonical_etypes
device
Get the device of the graph.
etypes
Return all the edge type names in the graph.
ntypes
Return all the node type names in the graph.
- num_canonical_edges_dict
- total_number_of_edges
- total_number_of_nodes
Methods
add_edge_data
(feat_obj, canonical_etype, ...)Add edge features
add_node_data
(feat_obj, ntype, feat_name)Add node features
edge_subgraph
(edges[, relabel_nodes, ...])Return a subgraph induced on given edges. This has the same semantics as
dgl.edge_subgraph
. Parameters ---------- edges : edges or dict[(str, str, str), edges] The edges to form the subgraph. The allowed edges formats are: * Int Tensor: Each element is an edge ID. The tensor must have the same device type and ID data type as the graph's. * iterable[int]: Each element is an edge ID. * Bool Tensor: Each \(i^{th}\) element is a bool flag indicating whether edge \(i\) is in the subgraph. If the graph is homogeneous, one can directly pass the above formats. Otherwise, the argument must be a dictionary with keys being edge types and values being the edge IDs in the above formats relabel_nodes : bool, optional If True, the extracted subgraph will only have the nodes in the specified node set and it will relabel the nodes in order. output_device : Framework-specific device context object, optional The output device. Default is the same as the input graph. Returns ------- DGLGraph The subgraph.find_edges
(eid[, etype, output_device])Return the source and destination node ID(s) given the edge ID(s).
get_edge_id_offset
(canonical_etype)Return the integer offset for node id of type etype
get_edge_storage
(key[, etype])Get storage object of edge feature of type
ntype
and namekey
get_node_id_offset
(ntype)Return the integer offset for node id of type ntype
get_node_storage
(key[, ntype])Get storage object of node feature of type
ntype
and namekey
global_uniform_negative_sampling
(num_samples)Per source negative sampling as in
dgl.dataloading.GlobalUniform
num_edges
([etype])Return the number of edges in the graph.
num_nodes
([ntype])Return the number of nodes in the graph. Parameters ---------- ntype : str, optional The node type name. If given, it returns the number of nodes of the type. If not given (default), it returns the total number of nodes of all types.
number_of_nodes
([ntype])Return the number of nodes in the graph. Alias of
num_nodes
Parameters ---------- ntype : str, optional The node type name. If given, it returns the number of nodes of the type. If not given (default), it returns the total number of nodes of all types.sample_neighbors
(nodes, fanout[, edge_dir, ...])Return a DGLGraph which is a subgraph induced by sampling neighboring edges of the given nodes. See
dgl.sampling.sample_neighbors
for detailed semantics. Parameters ---------- nodes : Tensor or dict[str, Tensor] Node IDs to sample neighbors from. This argument can take a single ID tensor or a dictionary of node types and ID tensors. If a single tensor is given, the graph must only have one type of nodes. fanout : int or dict[etype, int] The number of edges to be sampled for each node on each edge type. This argument can take a single int or a dictionary of edge types and ints. If a single int is given, DGL will sample this number of edges for each node for every edge type. If -1 is given for a single edge type, all the neighboring edges with that edge type will be selected. edge_dir: 'in' or 'out' The direction of edges to import prob : str, optional Feature name used as the (un-normalized) probabilities associated with each neighboring edge of a node. The feature must have only one element for each edge. The features must be non-negative floats, and the sum of the features of inbound/outbound edges for every node must be positive (though they don't have to sum up to one). Otherwise, the result will be undefined. Ifprob
is not None, GPU sampling is not supported. exclude_edges: tensor or dict Edge IDs to exclude during sampling neighbors for the seed nodes. This argument can take a single ID tensor or a dictionary of edge types and ID tensors. If a single tensor is given, the graph must only have one type of nodes. replace : bool, optional If True, sample with replacement. output_device : Framework-specific device context object, optional The output device. Default is the same as the input graph. Returns ------- DGLGraph A sampled subgraph with the same nodes as the original graph, but only the sampled neighboring edges. The induced edge IDs will be inedata[dgl.EID]
.subgraph
(nodes[, relabel_nodes, output_device])Return a subgraph induced on given nodes. This has the same semantics as
dgl.node_subgraph
. Parameters ---------- nodes : nodes or dict[str, nodes] The nodes to form the subgraph. The allowed nodes formats are: * Int Tensor: Each element is a node ID. The tensor must have the same device type and ID data type as the graph's. * iterable[int]: Each element is a node ID. * Bool Tensor: Each \(i^{th}\) element is a bool flag indicating whether node \(i\) is in the subgraph. If the graph is homogeneous, directly pass the above formats. Otherwise, the argument must be a dictionary with keys being node types and values being the node IDs in the above formats. relabel_nodes : bool, optional If True, the extracted subgraph will only have the nodes in the specified node set and it will relabel the nodes in order. output_device : Framework-specific device context object, optional The output device. Default is the same as the input graph. Returns ------- DGLGraph The subgraph.cugraph_e_id_to_dgl_id
cugraph_n_id_to_dgl_id
dgl_e_id_to_cugraph_id
dgl_n_id_to_cugraph_id
get_corresponding_canonical_etype
- __init__(data_dict: Dict[Tuple[str, str, str], Union[DataFrame, DataFrame]], num_nodes_dict: Dict[str, int], single_gpu: bool = True, device_id: int = 0, idtype=torch.int64)[source]#
Constructor for creating a object of instance CuGraphStorage
See also
cugraph_dgl.cugraph_storage_from_heterograph
to convert from DGLHeteroGraph to CuGraphStorage- Parameters:
- data_dict:
The dictionary data for constructing a heterogeneous graph. The keys are in the form of string triplets (src_type, edge_type, dst_type), specifying the source node, edge, and destination node types. The values are graph data is a dataframe with 2 columns form of (𝑈,𝑉), where (𝑈[𝑖],𝑉[𝑖]) forms the edge with ID 𝑖.
- num_nodes_dict: dict[str, int]
The number of nodes for some node types, which is a dictionary mapping a node type T to the number of T-typed nodes.
- single_gpu: bool
Whether to create the cugraph Property Graph on a single GPU or multiple GPUs single GPU = True single GPU = False
- device_id: int
If specified, must be the integer ID of the GPU device to have the results being created on
- idtype: Framework-specific device object,
The data type for storing the structure-related graph information this can be
torch.int32
ortorch.int64
for PyTorch. Defaults totorch.int64
if pytorch is installed
Examples
- The following example uses CuGraphStorage :
>>> from cugraph_dgl.cugraph_storage import CuGraphStorage >>> import cudf >>> import torch >>> num_nodes_dict={"drug": 3, "gene": 2, "disease": 1} >>> drug_interacts_drug_df = cudf.DataFrame({"src": [0, 1], "dst": [1, 2]}) >>> drug_interacts_gene = cudf.DataFrame({"src": [0, 1], "dst": [0, 1]}) >>> drug_treats_disease = cudf.DataFrame({"src": [1], "dst": [0]}) >>> data_dict = {("drug", "interacts", "drug"):drug_interacts_drug_df, ("drug", "interacts", "gene"):drug_interacts_gene, ("drug", "treats", "disease"):drug_treats_disease } >>> gs = CuGraphStorage(data_dict=data_dict, num_nodes_dict=num_nodes_dict) >>> gs.add_node_data(ntype='drug', feat_name='node_feat', feat_obj=torch.as_tensor([0.1, 0.2, 0.3])) >>> gs.add_edge_data(canonical_etype=("drug", "interacts", "drug"), feat_name='edge_feat', feat_obj=torch.as_tensor([0.2, 0.4])) >>> gs.ntypes ['disease', 'drug', 'gene'] >>> gs.etypes ['interacts', 'interacts', 'treats'] >>> gs.canonical_etypes [('drug', 'interacts', 'drug'), ('drug', 'interacts', 'gene'), ('drug', 'treats', 'disease')]
>>> gs.sample_neighbors({'disease':[0]}, 1) Graph(num_nodes={'disease': 1, 'drug': 3, 'gene': 2}, num_edges={('drug', 'interacts', 'drug'): 0, ('drug', 'interacts', 'gene'): 0, ('drug', 'treats', 'disease'): 1}, metagraph=[('drug', 'drug', 'interacts'), ('drug', 'gene', 'interacts'), ('drug', 'disease', 'treats')])
>>> gs.get_node_storage(key='node_feat', ntype='drug').fetch([0,1,2]) tensor([0.1000, 0.2000, 0.3000], device='cuda:0', dtype=torch.float64)
>>> es = gs.get_edge_storage(key='edge_feat', etype=('drug', 'interacts', 'drug')) >>> es.fetch([0,1]) tensor([0.2000, 0.4000], device='cuda:0', dtype=torch.float64)
Methods
__init__
(data_dict, num_nodes_dict[, ...])Constructor for creating a object of instance CuGraphStorage
add_edge_data
(feat_obj, canonical_etype, ...)Add edge features
add_node_data
(feat_obj, ntype, feat_name)Add node features
cugraph_e_id_to_dgl_id
(index_t, canonical_etype)cugraph_n_id_to_dgl_id
(index_t, ntype)dgl_e_id_to_cugraph_id
(index_t, canonical_etype)dgl_n_id_to_cugraph_id
(index_t, ntype)edge_subgraph
(edges[, relabel_nodes, ...])Return a subgraph induced on given edges. This has the same semantics as
dgl.edge_subgraph
. Parameters ---------- edges : edges or dict[(str, str, str), edges] The edges to form the subgraph. The allowed edges formats are: * Int Tensor: Each element is an edge ID. The tensor must have the same device type and ID data type as the graph's. * iterable[int]: Each element is an edge ID. * Bool Tensor: Each \(i^{th}\) element is a bool flag indicating whether edge \(i\) is in the subgraph. If the graph is homogeneous, one can directly pass the above formats. Otherwise, the argument must be a dictionary with keys being edge types and values being the edge IDs in the above formats relabel_nodes : bool, optional If True, the extracted subgraph will only have the nodes in the specified node set and it will relabel the nodes in order. output_device : Framework-specific device context object, optional The output device. Default is the same as the input graph. Returns ------- DGLGraph The subgraph.find_edges
(eid[, etype, output_device])Return the source and destination node ID(s) given the edge ID(s).
get_corresponding_canonical_etype
(etype)get_edge_id_offset
(canonical_etype)Return the integer offset for node id of type etype
get_edge_storage
(key[, etype])Get storage object of edge feature of type
ntype
and namekey
get_node_id_offset
(ntype)Return the integer offset for node id of type ntype
get_node_storage
(key[, ntype])Get storage object of node feature of type
ntype
and namekey
global_uniform_negative_sampling
(num_samples)Per source negative sampling as in
dgl.dataloading.GlobalUniform
num_edges
([etype])Return the number of edges in the graph.
num_nodes
([ntype])Return the number of nodes in the graph. Parameters ---------- ntype : str, optional The node type name. If given, it returns the number of nodes of the type. If not given (default), it returns the total number of nodes of all types.
number_of_nodes
([ntype])Return the number of nodes in the graph. Alias of
num_nodes
Parameters ---------- ntype : str, optional The node type name. If given, it returns the number of nodes of the type. If not given (default), it returns the total number of nodes of all types.sample_neighbors
(nodes, fanout[, edge_dir, ...])Return a DGLGraph which is a subgraph induced by sampling neighboring edges of the given nodes. See
dgl.sampling.sample_neighbors
for detailed semantics. Parameters ---------- nodes : Tensor or dict[str, Tensor] Node IDs to sample neighbors from. This argument can take a single ID tensor or a dictionary of node types and ID tensors. If a single tensor is given, the graph must only have one type of nodes. fanout : int or dict[etype, int] The number of edges to be sampled for each node on each edge type. This argument can take a single int or a dictionary of edge types and ints. If a single int is given, DGL will sample this number of edges for each node for every edge type. If -1 is given for a single edge type, all the neighboring edges with that edge type will be selected. edge_dir: 'in' or 'out' The direction of edges to import prob : str, optional Feature name used as the (un-normalized) probabilities associated with each neighboring edge of a node. The feature must have only one element for each edge. The features must be non-negative floats, and the sum of the features of inbound/outbound edges for every node must be positive (though they don't have to sum up to one). Otherwise, the result will be undefined. Ifprob
is not None, GPU sampling is not supported. exclude_edges: tensor or dict Edge IDs to exclude during sampling neighbors for the seed nodes. This argument can take a single ID tensor or a dictionary of edge types and ID tensors. If a single tensor is given, the graph must only have one type of nodes. replace : bool, optional If True, sample with replacement. output_device : Framework-specific device context object, optional The output device. Default is the same as the input graph. Returns ------- DGLGraph A sampled subgraph with the same nodes as the original graph, but only the sampled neighboring edges. The induced edge IDs will be inedata[dgl.EID]
.subgraph
(nodes[, relabel_nodes, output_device])Return a subgraph induced on given nodes. This has the same semantics as
dgl.node_subgraph
. Parameters ---------- nodes : nodes or dict[str, nodes] The nodes to form the subgraph. The allowed nodes formats are: * Int Tensor: Each element is a node ID. The tensor must have the same device type and ID data type as the graph's. * iterable[int]: Each element is a node ID. * Bool Tensor: Each \(i^{th}\) element is a bool flag indicating whether node \(i\) is in the subgraph. If the graph is homogeneous, directly pass the above formats. Otherwise, the argument must be a dictionary with keys being node types and values being the node IDs in the above formats. relabel_nodes : bool, optional If True, the extracted subgraph will only have the nodes in the specified node set and it will relabel the nodes in order. output_device : Framework-specific device context object, optional The output device. Default is the same as the input graph. Returns ------- DGLGraph The subgraph.Attributes
canonical_etypes
device
Get the device of the graph. Returns ------- device context The device of the graph, which should be a framework-specific device object (e.g.,
torch.device
).etypes
Return all the edge type names in the graph.
ntypes
Return all the node type names in the graph.
num_canonical_edges_dict
total_number_of_edges
total_number_of_nodes