Sampling#

Uniform Random Walks#

cugraph_error_code_t cugraph_uniform_random_walks(const cugraph_resource_handle_t *handle, cugraph_graph_t *graph, const cugraph_type_erased_device_array_view_t *start_vertices, size_t max_length, cugraph_random_walk_result_t **result, cugraph_error_t **error)#

Compute uniform random walks.

Parameters:
  • handle[in] Handle for accessing resources

  • graph[in] Pointer to graph. NOTE: Graph might be modified if the storage needs to be transposed

  • start_vertices[in] Array of source vertices

  • max_length[in] Maximum length of the generated path

  • result[out] Output from the node2vec call

  • error[out] Pointer to an error object storing details of any error. Will be populated if error code is not CUGRAPH_SUCCESS

Returns:

error code

Biased Random Walks#

cugraph_error_code_t cugraph_biased_random_walks(const cugraph_resource_handle_t *handle, cugraph_graph_t *graph, const cugraph_type_erased_device_array_view_t *start_vertices, size_t max_length, cugraph_random_walk_result_t **result, cugraph_error_t **error)#

Compute biased random walks.

Parameters:
  • handle[in] Handle for accessing resources

  • graph[in] Pointer to graph. NOTE: Graph might be modified if the storage needs to be transposed

  • start_vertices[in] Array of source vertices

  • max_length[in] Maximum length of the generated path

  • result[out] Output from the node2vec call

  • error[out] Pointer to an error object storing details of any error. Will be populated if error code is not CUGRAPH_SUCCESS

Returns:

error code

Random Walks via Node2Vec#

cugraph_error_code_t cugraph_node2vec_random_walks(const cugraph_resource_handle_t *handle, cugraph_graph_t *graph, const cugraph_type_erased_device_array_view_t *start_vertices, size_t max_length, double p, double q, cugraph_random_walk_result_t **result, cugraph_error_t **error)#

Compute random walks using the node2vec framework.

Parameters:
  • handle[in] Handle for accessing resources

  • graph[in] Pointer to graph. NOTE: Graph might be modified if the storage needs to be transposed

  • start_vertices[in] Array of source vertices

  • max_length[in] Maximum length of the generated path

  • compress_result[in] If true, return the paths as a compressed sparse row matrix, otherwise return as a dense matrix

  • p[in] The return parameter

  • q[in] The in/out parameter

  • result[out] Output from the node2vec call

  • error[out] Pointer to an error object storing details of any error. Will be populated if error code is not CUGRAPH_SUCCESS

Returns:

error code

Node2Vec#

cugraph_error_code_t cugraph_node2vec(const cugraph_resource_handle_t *handle, cugraph_graph_t *graph, const cugraph_type_erased_device_array_view_t *sources, size_t max_depth, bool_t compress_result, double p, double q, cugraph_random_walk_result_t **result, cugraph_error_t **error)#

Compute random walks using the node2vec framework.

Deprecated:

This call should be replaced with cugraph_node2vec_random_walks

Parameters:
  • handle[in] Handle for accessing resources

  • graph[in] Pointer to graph. NOTE: Graph might be modified if the storage needs to be transposed

  • sources[in] Array of source vertices

  • max_depth[in] Maximum length of the generated path

  • compress_result[in] If true, return the paths as a compressed sparse row matrix, otherwise return as a dense matrix

  • p[in] The return parameter

  • q[in] The in/out parameter

  • result[out] Output from the node2vec call

  • error[out] Pointer to an error object storing details of any error. Will be populated if error code is not CUGRAPH_SUCCESS

Returns:

error code

Uniform Neighbor Sampling#

cugraph_error_code_t cugraph_uniform_neighbor_sample(const cugraph_resource_handle_t *handle, cugraph_graph_t *graph, const cugraph_type_erased_device_array_view_t *start_vertices, const cugraph_type_erased_device_array_view_t *start_vertex_labels, const cugraph_type_erased_device_array_view_t *label_list, const cugraph_type_erased_device_array_view_t *label_to_comm_rank, const cugraph_type_erased_device_array_view_t *label_offsets, const cugraph_type_erased_host_array_view_t *fan_out, cugraph_rng_state_t *rng_state, const cugraph_sampling_options_t *options, bool_t do_expensive_check, cugraph_sample_result_t **result, cugraph_error_t **error)#

Uniform Neighborhood Sampling.

Deprecated:

This API will be deleted, use cugraph_homogeneous_uniform_neighbor_sample

Returns a sample of the neighborhood around specified start vertices. Optionally, each start vertex can be associated with a label, allowing the caller to specify multiple batches of sampling requests in the same function call - which should improve GPU utilization.

If label is NULL then all start vertices will be considered part of the same batch and the return value will not have a label column.

Parameters:
  • handle[in] Handle for accessing resources

  • graph[in] Pointer to graph. NOTE: Graph might be modified if the storage needs to be transposed

  • start_vertices[in] Device array of start vertices for the sampling

  • start_vertex_labels[in] Device array of start vertex labels for the sampling. The labels associated with each start vertex will be included in the output associated with results that were derived from that start vertex. We only support label of type INT32. If label is NULL, the return data will not be labeled.

  • label_list[in] Device array of the labels included in start_vertex_labels. If label_to_comm_rank is not specified this parameter is ignored. If specified, label_list must be sorted in ascending order.

  • label_to_comm_rank[in] Device array identifying which comm rank the output for a particular label should be shuffled in the output. If not specifed the data is not organized in output. If specified then the all data from label_list[i] will be shuffled to rank . This cannot be specified unless start_vertex_labels is also specified label_to_comm_rank[i]. If not specified then the output data will not be shuffled between ranks.

  • label_offsets[in] Device array of the offsets for each label in the seed list. This parameter is only used with the retain_seeds option.

  • fan_out[in] Host array defining the fan out at each step in the sampling algorithm. We only support fan_out values of type INT32

  • rng_state[inout] State of the random number generator, updated with each call

  • sampling_options[in] Opaque pointer defining the sampling options.

  • do_expensive_check[in] A flag to run expensive checks for input arguments (if set to true)

  • result[out] Output from the uniform_neighbor_sample call

  • error[out] Pointer to an error object storing details of any error. Will be populated if error code is not CUGRAPH_SUCCESS

Returns:

error code

Sampling Support Functions#

cugraph_type_erased_device_array_view_t *cugraph_lookup_result_get_srcs(const cugraph_lookup_result_t *result)#

Get the edge sources from the lookup result.

Parameters:

result[in] The result from src-dst lookup using edge ids and type(s)

Returns:

type erased array pointing to the edge sources

cugraph_type_erased_device_array_view_t *cugraph_lookup_result_get_dsts(const cugraph_lookup_result_t *result)#

Get the edge destinations from the lookup result.

Parameters:

result[in] The result from src-dst lookup using edge ids and type(s)

Returns:

type erased array pointing to the edge destinations

void cugraph_lookup_result_free(cugraph_lookup_result_t *result)#

Free a src-dst lookup result.

Parameters:

result[in] The result from src-dst lookup using edge ids and type(s)

void cugraph_lookup_container_free(cugraph_lookup_container_t *container)#

Free a sampling lookup map.

Parameters:

container[in] The sampling lookup map (a.k.a. container).

size_t cugraph_random_walk_result_get_max_path_length(cugraph_random_walk_result_t *result)#

Get the max path length from random walk result.

Parameters:

result[in] The result from random walks

Returns:

maximum path length

cugraph_type_erased_device_array_view_t *cugraph_random_walk_result_get_paths(cugraph_random_walk_result_t *result)#

Get the matrix (row major order) of vertices in the paths.

Parameters:

result[in] The result from a random walk algorithm

Returns:

type erased array pointing to the path matrix in device memory

cugraph_type_erased_device_array_view_t *cugraph_random_walk_result_get_weights(cugraph_random_walk_result_t *result)#

Get the matrix (row major order) of edge weights in the paths.

Parameters:

result[in] The result from a random walk algorithm

Returns:

type erased array pointing to the path edge weights in device memory

cugraph_type_erased_device_array_view_t *cugraph_random_walk_result_get_path_sizes(cugraph_random_walk_result_t *result)#

If the random walk result is compressed, get the path sizes.

Deprecated:

This call will no longer be relevant once the new node2vec are called

Parameters:

result[in] The result from a random walk algorithm

Returns:

type erased array pointing to the path sizes in device memory

void cugraph_random_walk_result_free(cugraph_random_walk_result_t *result)#

Free random walks result.

Parameters:

result[in] The result from random walks

cugraph_error_code_t cugraph_sampling_options_create(cugraph_sampling_options_t **options, cugraph_error_t **error)#

Create sampling options object.

All sampling options set to FALSE

Parameters:
  • options[out] Opaque pointer to the sampling options

  • error[out] Pointer to an error object storing details of any error. Will be populated if error code is not CUGRAPH_SUCCESS

void cugraph_sampling_set_retain_seeds(cugraph_sampling_options_t *options, bool_t value)#

Set flag to retain seeds (original sources)

Parameters:
  • options – - opaque pointer to the sampling options

  • value – - Boolean value to assign to the option

void cugraph_sampling_set_renumber_results(cugraph_sampling_options_t *options, bool_t value)#

Set flag to renumber results.

Parameters:
  • options – - opaque pointer to the sampling options

  • value – - Boolean value to assign to the option

void cugraph_sampling_set_compress_per_hop(cugraph_sampling_options_t *options, bool_t value)#

Set whether to compress per-hop (True) or globally (False)

Parameters:
  • options – - opaque pointer to the sampling options

  • value – - Boolean value to assign to the option

void cugraph_sampling_set_with_replacement(cugraph_sampling_options_t *options, bool_t value)#

Set flag to sample with_replacement.

Parameters:
  • options – - opaque pointer to the sampling options

  • value – - Boolean value to assign to the option

void cugraph_sampling_set_return_hops(cugraph_sampling_options_t *options, bool_t value)#

Set flag to sample return_hops.

Parameters:
  • options – - opaque pointer to the sampling options

  • value – - Boolean value to assign to the option

void cugraph_sampling_set_compression_type(cugraph_sampling_options_t *options, cugraph_compression_type_t value)#

Set compression type.

Parameters:
  • options – - opaque pointer to the sampling options

  • value – - Enum defining the compresion type

void cugraph_sampling_set_prior_sources_behavior(cugraph_sampling_options_t *options, cugraph_prior_sources_behavior_t value)#

Set prior sources behavior.

Parameters:
  • options – - opaque pointer to the sampling options

  • value – - Enum defining prior sources behavior

void cugraph_sampling_set_dedupe_sources(cugraph_sampling_options_t *options, bool_t value)#

Set flag to sample dedupe_sources prior to sampling.

Parameters:
  • options – - opaque pointer to the sampling options

  • value – - Boolean value to assign to the option

void cugraph_sampling_options_free(cugraph_sampling_options_t *options)#

Free sampling options object.

Parameters:

options[in] Opaque pointer to sampling object

cugraph_type_erased_device_array_view_t *cugraph_sample_result_get_majors(const cugraph_sample_result_t *result)#

Get the major vertices from the sampling algorithm result.

Parameters:

result[in] The result from a sampling algorithm

Returns:

type erased array pointing to the major vertices in device memory

cugraph_type_erased_device_array_view_t *cugraph_sample_result_get_minors(const cugraph_sample_result_t *result)#

Get the minor vertices from the sampling algorithm result.

Parameters:

result[in] The result from a sampling algorithm

Returns:

type erased array pointing to the minor vertices in device memory

cugraph_type_erased_device_array_view_t *cugraph_sample_result_get_major_offsets(const cugraph_sample_result_t *result)#

Get the major offsets from the sampling algorithm result.

Parameters:

result[in] The result from a sampling algorithm

Returns:

type erased array pointing to the major offsets in device memory

cugraph_type_erased_device_array_view_t *cugraph_sample_result_get_start_labels(const cugraph_sample_result_t *result)#

Get the start labels from the sampling algorithm result.

Parameters:

result[in] The result from a sampling algorithm

Returns:

type erased array pointing to the start labels

cugraph_type_erased_device_array_view_t *cugraph_sample_result_get_edge_id(const cugraph_sample_result_t *result)#

Get the edge_id from the sampling algorithm result.

Parameters:

result[in] The result from a sampling algorithm

Returns:

type erased array pointing to the edge_id

cugraph_type_erased_device_array_view_t *cugraph_sample_result_get_edge_type(const cugraph_sample_result_t *result)#

Get the edge_type from the sampling algorithm result.

Parameters:

result[in] The result from a sampling algorithm

Returns:

type erased array pointing to the edge_type

cugraph_type_erased_device_array_view_t *cugraph_sample_result_get_edge_weight(const cugraph_sample_result_t *result)#

Get the edge_weight from the sampling algorithm result.

Parameters:

result[in] The result from a sampling algorithm

Returns:

type erased array pointing to the edge_weight

cugraph_type_erased_device_array_view_t *cugraph_sample_result_get_hop(const cugraph_sample_result_t *result)#

Get the hop from the sampling algorithm result.

Parameters:

result[in] The result from a sampling algorithm

Returns:

type erased array pointing to the hop

cugraph_type_erased_device_array_view_t *cugraph_sample_result_get_label_hop_offsets(const cugraph_sample_result_t *result)#

Get the label-hop offsets from the sampling algorithm result.

Parameters:

result[in] The result from a sampling algorithm

Returns:

type erased array pointing to the label-hop offsets

cugraph_type_erased_device_array_view_t *cugraph_sample_result_get_index(const cugraph_sample_result_t *result)#

Get the index from the sampling algorithm result.

Parameters:

result[in] The result from a sampling algorithm

Returns:

type erased array pointing to the index

cugraph_type_erased_device_array_view_t *cugraph_sample_result_get_renumber_map(const cugraph_sample_result_t *result)#

Get the renumber map.

Parameters:

result[in] The result from a sampling algorithm

Returns:

type erased array pointing to the renumber map

cugraph_type_erased_device_array_view_t *cugraph_sample_result_get_renumber_map_offsets(const cugraph_sample_result_t *result)#

Get the renumber map offsets.

Parameters:

result[in] The result from a sampling algorithm

Returns:

type erased array pointing to the renumber map offsets

cugraph_type_erased_device_array_view_t *cugraph_sample_result_get_edge_renumber_map(const cugraph_sample_result_t *result)#

Get the edge renumber map.

Parameters:

result[in] The result from a sampling algorithm

Returns:

type erased array pointing to the renumber map

cugraph_type_erased_device_array_view_t *cugraph_sample_result_get_edge_renumber_map_offsets(const cugraph_sample_result_t *result)#

Get the edge renumber map offets.

Parameters:

result[in] The result from a sampling algorithm

Returns:

type erased array pointing to the renumber map

void cugraph_sample_result_free(cugraph_sample_result_t *result)#

Free a sampling result.

Parameters:

result[in] The result from a sampling algorithm

cugraph_error_code_t cugraph_test_sample_result_create(const cugraph_resource_handle_t *handle, const cugraph_type_erased_device_array_view_t *srcs, const cugraph_type_erased_device_array_view_t *dsts, const cugraph_type_erased_device_array_view_t *edge_id, const cugraph_type_erased_device_array_view_t *edge_type, const cugraph_type_erased_device_array_view_t *wgt, const cugraph_type_erased_device_array_view_t *hop, const cugraph_type_erased_device_array_view_t *label, cugraph_sample_result_t **result, cugraph_error_t **error)#

Create a sampling result (testing API)

Parameters:
  • handle[in] Handle for accessing resources

  • srcs[in] Device array view to populate srcs

  • dsts[in] Device array view to populate dsts

  • edge_id[in] Device array view to populate edge_id (can be NULL)

  • edge_type[in] Device array view to populate edge_type (can be NULL)

  • wgt[in] Device array view to populate wgt (can be NULL)

  • hop[in] Device array view to populate hop

  • label[in] Device array view to populate label (can be NULL)

  • result[out] Pointer to the location to store the cugraph_sample_result_t*

  • error[out] Pointer to an error object storing details of any error. Will be populated if error code is not CUGRAPH_SUCCESS

Returns:

error code

cugraph_error_code_t cugraph_test_uniform_neighborhood_sample_result_create(const cugraph_resource_handle_t *handle, const cugraph_type_erased_device_array_view_t *srcs, const cugraph_type_erased_device_array_view_t *dsts, const cugraph_type_erased_device_array_view_t *edge_id, const cugraph_type_erased_device_array_view_t *edge_type, const cugraph_type_erased_device_array_view_t *weight, const cugraph_type_erased_device_array_view_t *hop, const cugraph_type_erased_device_array_view_t *label, cugraph_sample_result_t **result, cugraph_error_t **error)#

Create a sampling result (testing API)

Parameters:
  • handle[in] Handle for accessing resources

  • srcs[in] Device array view to populate srcs

  • dsts[in] Device array view to populate dsts

  • edge_id[in] Device array view to populate edge_id

  • edge_type[in] Device array view to populate edge_type

  • weight[in] Device array view to populate weight

  • hop[in] Device array view to populate hop

  • label[in] Device array view to populate label

  • result[out] Pointer to the location to store the cugraph_sample_result_t*

  • error[out] Pointer to an error object storing details of any error. Will be populated if error code is not CUGRAPH_SUCCESS

Returns:

error code

cugraph_error_code_t cugraph_select_random_vertices(const cugraph_resource_handle_t *handle, const cugraph_graph_t *graph, cugraph_rng_state_t *rng_state, size_t num_vertices, cugraph_type_erased_device_array_t **vertices, cugraph_error_t **error)#

Select random vertices from the graph.

Parameters:
  • handle[in] Handle for accessing resources

  • graph[in] Pointer to graph

  • rng_state[inout] State of the random number generator, updated with each call

  • num_vertices[in] Number of vertices to sample

  • vertices[out] Device array view to populate label

  • error[out] Pointer to an error object storing details of any error. Will be populated if error code is not CUGRAPH_SUCCESS

Returns:

error code

cugraph_error_code_t cugraph_negative_sampling(const cugraph_resource_handle_t *handle, cugraph_rng_state_t *rng_state, cugraph_graph_t *graph, const cugraph_type_erased_device_array_view_t *vertices, const cugraph_type_erased_device_array_view_t *src_biases, const cugraph_type_erased_device_array_view_t *dst_biases, size_t num_samples, bool_t remove_duplicates, bool_t remove_existing_edges, bool_t exact_number_of_samples, bool_t do_expensive_check, cugraph_coo_t **result, cugraph_error_t **error)#

Perform negative sampling.

Negative sampling generates a COO structure defining edges according to the specified parameters

Parameters:
  • handle[in] Handle for accessing resources

  • rng_state[inout] State of the random number generator, updated with each call

  • graph[in] Pointer to graph

  • vertices[in] Vertex ids for the source biases. If src_bias and dst_bias are not specified this is ignored. If vertices is specified then vertices[i] is the vertex id of src_biases[i] and dst_biases[i]. If vertices is not specified then i is the vertex id if src_biases[i] and dst_biases[i]

  • src_biases[in] Bias for selecting source vertices. If NULL, do uniform sampling, if provided probability of vertex i will be src_bias[i] / (sum of all source biases)

  • dst_biases[in] Bias for selecting destination vertices. If NULL, do uniform sampling, if provided probability of vertex i will be dst_bias[i] / (sum of all destination biases)

  • num_samples[in] Number of negative samples to generate

  • remove_duplicates[in] If true, remove duplicates from sampled edges

  • remove_existing_edges[in] If true, remove sampled edges that actually exist in the graph

  • exact_number_of_samples[in] If true, result should contain exactly num_samples. If false the code will generate num_samples and then do any filtering as specified

  • do_expensive_check[in] A flag to run expensive checks for input arguments (if set to true)

  • result[out] Opaque pointer to generated coo list

  • error[out] Pointer to an error object storing details of any error. Will be populated if error code is not CUGRAPH_SUCCESS

Returns:

error code