cugraph.uniform_neighbor_sample#

cugraph.uniform_neighbor_sample(G: Graph, start_list: Sequence, fanout_vals: List[int], *, with_replacement: bool = True, with_edge_properties: bool = False, with_batch_ids: bool = False, random_state: int = None, return_offsets: bool = False, return_hops: bool = True, include_hop_column: bool = True, prior_sources_behavior: str = None, deduplicate_sources: bool = False, renumber: bool = False, retain_seeds: bool = False, label_offsets: Sequence = None, use_legacy_names: bool = True, compress_per_hop: bool = False, compression: str = 'COO') Union[cudf.DataFrame, Tuple[cudf.DataFrame, cudf.DataFrame]][source]#

Does neighborhood sampling, which samples nodes from a graph based on the current node’s neighbors, with a corresponding fanout value at each hop.

Parameters:
Gcugraph.Graph

cuGraph graph, which contains connectivity information as dask cudf edge list dataframe

start_listlist or cudf.Series (int32)

a list of starting vertices for sampling

fanout_valslist (int32)

List of branching out (fan-out) degrees per starting vertex for each hop level.

with_replacement: bool, optional (default=True)

Flag to specify if the random sampling is done with replacement

with_edge_properties: bool, optional (default=False)

Deprecated. Flag to specify whether to return edge properties (weight, edge id, edge type, batch id, hop id) with the sampled edges.

with_batch_ids: bool, optional (default=False)

Flag to specify whether batch ids are present in the start_list Assumes they are the last column in the start_list dataframe

random_state: int, optional

Random seed to use when making sampling calls.

return_offsets: bool, optional (default=False)

Whether to return the sampling results with batch ids included as one dataframe, or to instead return two dataframes, one with sampling results and one with batch ids and their start offsets.

return_hops: bool, optional (default=True)

Whether to return the sampling results with hop ids corresponding to the hop where the edge appeared. Defaults to True.

include_hop_column: bool, optional (default=True)

Deprecated. Defaults to True. If True, will include the hop column even if return_offsets is True. This option will be removed in release 23.12.

prior_sources_behavior: str, optional (default=None)

Options are “carryover”, and “exclude”. Default will leave the source list as-is. Carryover will carry over sources from previous hops to the current hop. Exclude will exclude sources from previous hops from reappearing as sources in future hops.

deduplicate_sources: bool, optional (default=False)

Whether to first deduplicate the list of possible sources from the previous destinations before performing next hop.

renumber: bool, optional (default=False)

Whether to renumber on a per-batch basis. If True, will return the renumber map and renumber map offsets as an additional dataframe.

retain_seeds: bool, optional (default=False)

If True, will retain the original seeds (original source vertices) in the output even if they do not have outgoing neighbors.

label_offsets: integer sequence, optional (default=None)

Offsets of each label within the start vertex list. Only used if retain_seeds is True. Required if retain_seeds is True.

use_legacy_names: bool, optional (default=True)

Whether to use the legacy column names (sources, destinations). If True, will use “sources” and “destinations” as the column names. If False, will use “majors” and “minors” as the column names. Deprecated. Will be removed in release 23.12 in favor of always using the new names “majors” and “minors”.

compress_per_hop: bool, optional (default=False)

Whether to compress globally (default), or to produce a separate compressed edgelist per hop.

compression: str, optional (default=COO)

Sets the compression type for the output minibatches. Valid options are COO (default), CSR, CSC, DCSR, and DCSC.

Returns:
resultcudf.DataFrame or Tuple[cudf.DataFrame, cudf.DataFrame]

GPU data frame containing multiple cudf.Series

If with_edge_properties=False:
df[‘sources’]: cudf.Series

Contains the source vertices from the sampling result

df[‘destinations’]: cudf.Series

Contains the destination vertices from the sampling result

df[‘indices’]: cudf.Series

Contains the indices (edge weights) from the sampling result for path reconstruction

If with_edge_properties=True:
If return_offsets=False:
df[‘sources’]: cudf.Series

Contains the source vertices from the sampling result

df[‘destinations’]: cudf.Series

Contains the destination vertices from the sampling result

df[‘edge_weight’]: cudf.Series

Contains the edge weights from the sampling result

df[‘edge_id’]: cudf.Series

Contains the edge ids from the sampling result

df[‘edge_type’]: cudf.Series

Contains the edge types from the sampling result

df[‘batch_id’]: cudf.Series

Contains the batch ids from the sampling result

df[‘hop_id’]: cudf.Series

Contains the hop ids from the sampling result

If renumber=True:

(adds the following dataframe) renumber_df[‘map’]: cudf.Series

Contains the renumber maps for each batch

renumber_df[‘offsets’]: cudf.Series

Contains the batch offsets for the renumber maps

If return_offsets=True:
df[‘sources’]: cudf.Series

Contains the source vertices from the sampling result

df[‘destinations’]: cudf.Series

Contains the destination vertices from the sampling result

df[‘edge_weight’]: cudf.Series

Contains the edge weights from the sampling result

df[‘edge_id’]: cudf.Series

Contains the edge ids from the sampling result

df[‘edge_type’]: cudf.Series

Contains the edge types from the sampling result

df[‘hop_id’]: cudf.Series

Contains the hop ids from the sampling result

offsets_df[‘batch_id’]: cudf.Series

Contains the batch ids from the sampling result

offsets_df[‘offsets’]: cudf.Series

Contains the offsets of each batch in the sampling result

If renumber=True:

(adds the following dataframe) renumber_df[‘map’]: cudf.Series

Contains the renumber maps for each batch

renumber_df[‘offsets’]: cudf.Series

Contains the batch offsets for the renumber maps