Sampling#

template<typename vertex_t, typename edge_t, typename weight_t, typename index_t, bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>, rmm::device_uvector<weight_t>, rmm::device_uvector<index_t>> random_walks(raft::handle_t const &handle, graph_view_t<vertex_t, edge_t, false, multi_gpu> const &graph_view, std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view, vertex_t const *ptr_d_start, index_t num_paths, index_t max_depth, bool use_padding, std::unique_ptr<sampling_params_t> sampling_strategy)#

returns random walks (RW) from starting sources, where each path is of given maximum length. Uniform distribution is assumed for the random engine.

.*

Deprecated:

This algorithm will be deprecated once all of the functionality is migrated to the newer APIS: uniform_random_walks(), biased_random_walks(), and node2vec_random_walks().

Template Parameters:

graph_t – Type of graph/view (typically, graph_view_t).

index_t – Type used to store indexing and sizes.

graph_t – Type of graph/view (typically, graph_view_t).

index_t – Type used to store indexing and sizes.

Parameters:

handle – RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and handles to various CUDA libraries) to run graph algorithms.

graph – Graph (view )object to generate RW on.

ptr_d_start – Device pointer to set of starting vertex indices for the RW.

num_paths – = number(paths).

max_depth – maximum length of RWs.

use_padding – (optional) specifies if return uses padded format (true), or coalesced (compressed) format; when padding is used the output is a matrix of vertex paths and a matrix of edges paths (weights); in this case the matrices are stored in row major order; the vertex path matrix is padded with num_vertices values and the weight matrix is padded with 0 values;

selector_type – identifier for sampling strategy: uniform, biased, etc.; defaults to uniform = 0;

handle – RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and handles to various CUDA libraries) to run graph algorithms.

graph – Graph (view )object to generate RW on.

ptr_d_start – Device pointer to set of starting vertex indices for the RW.

num_paths – = number(paths).

max_depth – maximum length of RWs.

use_padding – (optional) specifies if return uses padded format (true), or coalesced (compressed) format; when padding is used the output is a matrix of vertex paths and a matrix of edges paths (weights); in this case the matrices are stored in row major order; the vertex path matrix is padded with num_vertices values and the weight matrix is padded with 0 values;

sampling_strategy – pointer for sampling strategy: uniform, biased, etc.; possible values{0==uniform, 1==biased, 2==node2vec}; defaults to nullptr == uniform;

Returns:

std::tuple<rmm::device_uvector<vertex_t>, rmm::device_uvector<weight_t>, rmm::device_uvector<index_t>> Triplet of either padded or coalesced RW paths; in the coalesced case (default), the return consists of corresponding vertex and edge weights for each, and corresponding path sizes. This is meant to minimize the number of DF’s to be passed to the Python layer. The meaning of “coalesced” here is that a 2D array of paths of different sizes is represented as a 1D contiguous array. In the padded case the return is a matrix of num_paths x max_depth vertex paths; and num_paths x (max_depth-1) edge (weight) paths, with an empty array of sizes. Note: if the graph is un-weighted the edge (weight) paths consists of weight_t{1} entries;

Returns:

std::tuple<rmm::device_uvector<vertex_t>, rmm::device_uvector<weight_t>, rmm::device_uvector<index_t>> Triplet of either padded or coalesced RW paths; in the coalesced case (default), the return consists of corresponding vertex and edge weights for each, and corresponding path sizes. This is meant to minimize the number of DF’s to be passed to the Python layer. The meaning of “coalesced” here is that a 2D array of paths of different sizes is represented as a 1D contiguous array. In the padded case the return is a matrix of num_paths x max_depth vertex paths; and num_paths x (max_depth-1) edge (weight) paths, with an empty array of sizes. Note: if the graph is un-weighted the edge (weight) paths consists of weight_t{1} entries;

template<typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>, std::optional<rmm::device_uvector<weight_t>>> uniform_random_walks(raft::handle_t const &handle, raft::random::RngState &rng_state, graph_view_t<vertex_t, edge_t, false, multi_gpu> const &graph_view, std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view, raft::device_span<vertex_t const> start_vertices, size_t max_length)#

returns uniform random walks from starting sources, where each path is of given maximum length.

.*

start_vertices can contain duplicates, in which case different random walks will be generated for each instance.

If edge_weight_view.has_value() is true, the return contains edge weights. If edge_weight_view.has_value() is false, the returned value will be std::nullopt.

Template Parameters:

vertex_t – Type of vertex identifiers. Needs to be an integral type.

edge_t – Type of edge identifiers. Needs to be an integral type.

weight_t – Type of edge weights. Needs to be a floating point type.

multi_gpu – Flag indicating whether template instantiation should target single-GPU (false)

Parameters:

handle – RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and handles to various CUDA libraries) to run graph algorithms.

graph_view – graph view to operate on

edge_weight_view – Optional view object holding edge weights for graph_view.

start_vertices – Device span defining the starting vertices

max_length – maximum length of random walk

seed – (optional, defaults to system time), seed for random number generation

Returns:

tuple containing device vectors of vertices and the edge weights (if edge_weight_view.has_value()
is true)

For each input selector there will be (max_length+1) elements in the vertex vector with the starting vertex followed by the subsequent vertices in the random walk. If a path terminates before max_length, the vertices will be populated with
invalid_vertex_id
(-1 for signed vertex_t, std::numeric_limits<vertex_t>::max() for an unsigned vertex_t type)

For each input selector there will be max_length elements in the weights vector with the edge weight for the edge in the path. If a path terminates before max_length the subsequent edge weights will be set to weight_t{0}.

template<typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>, std::optional<rmm::device_uvector<weight_t>>> biased_random_walks(raft::handle_t const &handle, raft::random::RngState &rng_state, graph_view_t<vertex_t, edge_t, false, multi_gpu> const &graph_view, edge_property_view_t<edge_t, weight_t const*> edge_weight_view, raft::device_span<vertex_t const> start_vertices, size_t max_length)#

returns biased random walks from starting sources, where each path is of given maximum length.

.*

The next vertex is biased based on the edge weights. The probability of traversing a departing edge will be the edge weight divided by the sum of the departing edge weights.

start_vertices can contain duplicates, in which case different random walks will be generated for each instance.

Throws:

cugraph::logic_error – if the graph is unweighted

Template Parameters:

vertex_t – Type of vertex identifiers. Needs to be an integral type.

edge_t – Type of edge identifiers. Needs to be an integral type.

weight_t – Type of edge weights. Needs to be a floating point type.

multi_gpu – Flag indicating whether template instantiation should target single-GPU (false)

Parameters:

handle – RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and handles to various CUDA libraries) to run graph algorithms.

graph_view – graph view to operate on

edge_weight_view – View object holding edge weights for graph_view.

start_vertices – Device span defining the starting vertices

max_length – maximum length of random walk

seed – (optional, defaults to system time), seed for random number generation

Returns:

tuple containing device vectors of vertices and the edge weights

For each input selector there will be (max_length+1) elements in the vertex vector with the starting vertex followed by the subsequent vertices in the random walk. If a path terminates before max_length, the vertices will be populated with
invalid_vertex_id
(-1 for signed vertex_t, std::numeric_limits<vertex_t>::max() for an unsigned vertex_t type)

For each input selector there will be max_length elements in the weights vector with the edge weight for the edge in the path. If a path terminates before max_length the subsequent edge weights will be set to weight_t{0}.

template<typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>, std::optional<rmm::device_uvector<weight_t>>> node2vec_random_walks(raft::handle_t const &handle, raft::random::RngState &rng_state, graph_view_t<vertex_t, edge_t, false, multi_gpu> const &graph_view, std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view, raft::device_span<vertex_t const> start_vertices, size_t max_length, weight_t p, weight_t q)#

returns biased random walks with node2vec biases from starting sources, where each path is of given maximum length.

.*

start_vertices can contain duplicates, in which case different random walks will be generated for each instance.

If the edge_weight_view.has_value() = true, the return contains edge weights and the node2vec computation will utilize the edge weights. If edge_weight_view.has_value() == false, then the return will not contain edge weights and the node2vec computation will assume an edge weight of 1 for all edges.

Template Parameters:

vertex_t – Type of vertex identifiers. Needs to be an integral type.

edge_t – Type of edge identifiers. Needs to be an integral type.

weight_t – Type of edge weights. Needs to be a floating point type.

multi_gpu – Flag indicating whether template instantiation should target single-GPU (false)

Parameters:

handle – RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and handles to various CUDA libraries) to run graph algorithms.

graph_view – graph view to operate on

edge_weight_view – Optional view object holding edge weights for graph_view. If edge_weight_view.has_value() == false, edge weights are assumed to be 1.0.

start_vertices – Device span defining the starting vertices

max_length – maximum length of random walk

p – node2vec return parameter

q – node2vec in-out parameter

seed – (optional, defaults to system time), seed for random number generation

Returns:

tuple containing device vectors of vertices and the edge weights

For each input selector there will be (max_length+1) elements in the vertex vector with the starting vertex followed by the subsequent vertices in the random walk. If a path terminates before max_length, the vertices will be populated with
invalid_vertex_id
(-1 for signed vertex_t, std::numeric_limits<vertex_t>::max() for an unsigned vertex_t type)

For each input selector there will be max_length elements in the weights vector with the edge weight for the edge in the path. If a path terminates before max_length the subsequent edge weights will be set to weight_t{0}.

Sampling#

This Page