cugraph_pyg.loader.dask_node_loader.BulkSampleLoader#
- class cugraph_pyg.loader.dask_node_loader.BulkSampleLoader(feature_store: DaskGraphStore, graph_store: DaskGraphStore, input_nodes: Tensor | None | str | Tuple[str, Tensor | None] = None, batch_size: int = 0, *, shuffle: bool = False, drop_last: bool = True, edge_types: Sequence[Tuple[str]] = None, directory: str | TemporaryDirectory = None, input_files: List[str] = None, starting_batch_id: int = 0, batches_per_partition: int = 100, num_neighbors: List[int] | Dict[Tuple[str, str, str], List[int]] = None, replace: bool = True, compression: str = 'COO', **kwargs)[source]#
Iterator that executes sampling using Dask and cuGraph and loads sampled minibatches from disk.
- __init__(feature_store: DaskGraphStore, graph_store: DaskGraphStore, input_nodes: Tensor | None | str | Tuple[str, Tensor | None] = None, batch_size: int = 0, *, shuffle: bool = False, drop_last: bool = True, edge_types: Sequence[Tuple[str]] = None, directory: str | TemporaryDirectory = None, input_files: List[str] = None, starting_batch_id: int = 0, batches_per_partition: int = 100, num_neighbors: List[int] | Dict[Tuple[str, str, str], List[int]] = None, replace: bool = True, compression: str = 'COO', **kwargs)[source]#
Executes a bulk sampling job immediately upon creation. Allows iteration over the returned results.
- Parameters:
- feature_store: DaskGraphStore
The feature store containing features for the graph.
- graph_store: DaskGraphStore
The graph store containing the graph structure.
- input_nodes: InputNodes
The input nodes associated with this sampler. If None, this loader will load batches from disk rather than performing sampling in memory.
- batch_size: int
The number of input nodes per sampling batch. Generally required unless loading already-sampled data from disk.
- shuffle: bool (optional, default=False)
Whether to shuffle the input indices. If True, will shuffle the input indices. If False, will create batches in the original order.
- edge_types: Sequence[Tuple[str]] (optional, default=None)
The desired edge types for the subgraph. Defaults to all edges in the graph.
- directory: str (optional, default=new tempdir)
The path of the directory to write samples to. Defaults to a new generated temporary directory.
- input_files: List[str] (optional, default=None)
The input files to read from the directory containing samples. This argument is only used when loading alread-sampled batches from disk.
- starting_batch_id: int (optional, default=0)
The starting id for each batch. Defaults to 0.
- batches_per_partition: int (optional, default=100)
The number of batches in each output partition. Defaults to 100. Gets passed to the bulk sampler if there is one; otherwise, this argument is used to determine which files to read.
- num_neighbors: Union[List[int],
Dict[Tuple[str, str, str], List[int]]] (required)
The number of neighbors to sample for each node in each iteration. If an entry is set to -1, all neighbors will be included. In heterogeneous graphs, may also take in a dictionary denoting the number of neighbors to sample for each individual edge type.
Note: in cuGraph, only one value of num_neighbors is currently supported. Passing in a dictionary will result in an exception.
Methods
__init__
(feature_store, graph_store[, ...])Executes a bulk sampling job immediately upon creation.