cugraph_pyg.sampler.distributed_sampler.DistributedNeighborSampler#

class cugraph_pyg.sampler.distributed_sampler.DistributedNeighborSampler(graph: SGGraph | MGGraph, *, local_seeds_per_call: int | None = None, retain_original_seeds: bool = False, fanout: List[int] = [-1], prior_sources_behavior: str = 'exclude', deduplicate_sources: bool = True, compression: str = 'COO', compress_per_hop: bool = False, with_replacement: bool = False, biased: bool = False, heterogeneous: bool = False, temporal: bool = False, temporal_comparison: str | None = None, vertex_type_offsets: Tensor | ndarray | Series | None = None, num_edge_types: int = 1)[source]#

Methods

sample_batches(seeds, seed_times, ...[, ...])

For a single call group of seeds and associated batch ids, performs sampling.

__init__(graph: SGGraph | MGGraph, *, local_seeds_per_call: int | None = None, retain_original_seeds: bool = False, fanout: List[int] = [-1], prior_sources_behavior: str = 'exclude', deduplicate_sources: bool = True, compression: str = 'COO', compress_per_hop: bool = False, with_replacement: bool = False, biased: bool = False, heterogeneous: bool = False, temporal: bool = False, temporal_comparison: str | None = None, vertex_type_offsets: Tensor | ndarray | Series | None = None, num_edge_types: int = 1)[source]#
Parameters:
graph: SGGraph or MGGraph (required)

The pylibcugraph graph object that will be sampled.

local_seeds_per_call: int

The number of seeds on this rank this sampler will process in a single sampling call. Batches will get split into multiple sampling calls based on this parameter. This parameter must be the same across all ranks. The total number of seeds processed per sampling call is this parameter times the world size. Subclasses should generally calculate the appropriate number of seeds.

retain_original_seeds: bool (optional, default=False)

Whether to retain the original seeds even if they do not appear in the output minibatch. This will affect the output renumber map and CSR/CSC graph if applicable.

Methods

__init__(graph, *[, local_seeds_per_call, ...])

get_start_batch_offset(local_num_batches[, ...])

Gets the starting batch offset to ensure each rank's set of batch ids is disjoint.

sample_batches(seeds, seed_times, ...[, ...])

For a single call group of seeds and associated batch ids, performs sampling.

sample_from_edges(edges, *[, batch_size, ...])

Performs sampling starting from seed edges.

sample_from_nodes(nodes, *[, batch_size, ...])

Performs node-based sampling.

Attributes

BASE_VERTICES_PER_BYTE

UNKNOWN_VERTICES_DEFAULT

is_multi_gpu