Similarity#
- template<typename VT, typename ET, typename WT>
void jaccard(legacy::GraphCSRView<VT, ET, WT> const &graph, WT const *weights, WT *result)#Compute jaccard similarity coefficient for all vertices.
Computes the Jaccard similarity coefficient for every pair of vertices in the graph which are connected by an edge.
- Throws:
cugraph::logic_error – when an error occurs.
- Template Parameters:
VT – Type of vertex identifiers. Supported value : int (signed, 32-bit)
ET – Type of edge identifiers. Supported value : int (signed, 32-bit)
WT – Type of edge weights. Supported value : float or double.
- Parameters:
graph – [in] The input graph object
weights – [in] device pointer to input vertex weights for weighted Jaccard, may be NULL for unweighted Jaccard.
result – [out] Device pointer to result values, memory needs to be pre-allocated by caller
- template<typename VT, typename ET, typename WT>
void jaccard_list(legacy::GraphCSRView<VT, ET, WT> const &graph, WT const *weights, ET num_pairs, VT const *first, VT const *second, WT *result)#Compute jaccard similarity coefficient for selected vertex pairs.
Computes the Jaccard similarity coefficient for each pair of specified vertices. Vertices are specified as pairs where pair[n] = (first[n], second[n])
- Throws:
cugraph::logic_error – when an error occurs.
- Template Parameters:
VT – Type of vertex identifiers. Supported value : int (signed, 32-bit)
ET – Type of edge identifiers. Supported value : int (signed, 32-bit)
WT – Type of edge weights. Supported value : float or double.
- Parameters:
graph – [in] The input graph object
weights – [in] The input vertex weights for weighted Jaccard, may be NULL for unweighted Jaccard.
num_pairs – [in] The number of vertex ID pairs specified
first – [in] Device pointer to first vertex ID of each pair
second – [in] Device pointer to second vertex ID of each pair
result – [out] Device pointer to result values, memory needs to be pre-allocated by caller
- template<typename VT, typename ET, typename WT>
void overlap(legacy::GraphCSRView<VT, ET, WT> const &graph, WT const *weights, WT *result)#Compute overlap coefficient for all vertices in the graph.
Computes the Overlap Coefficient for every pair of vertices in the graph which are connected by an edge.
- Throws:
cugraph::logic_error – when an error occurs.
- Template Parameters:
VT – Type of vertex identifiers. Supported value : int (signed, 32-bit)
ET – Type of edge identifiers. Supported value : int (signed, 32-bit)
WT – Type of edge weights. Supported value : float or double.
- Parameters:
graph – [in] The input graph object
weights – [in] device pointer to input vertex weights for weighted overlap, may be NULL for unweighted overlap.
result – [out] Device pointer to result values, memory needs to be pre-allocated by caller
- template<typename VT, typename ET, typename WT>
void overlap_list(legacy::GraphCSRView<VT, ET, WT> const &graph, WT const *weights, ET num_pairs, VT const *first, VT const *second, WT *result)#Compute overlap coefficient for select pairs of vertices.
Computes the overlap coefficient for each pair of specified vertices. Vertices are specified as pairs where pair[n] = (first[n], second[n])
- Throws:
cugraph::logic_error – when an error occurs.
- Template Parameters:
VT – Type of vertex identifiers. Supported value : int (signed, 32-bit)
ET – Type of edge identifiers. Supported value : int (signed, 32-bit)
WT – Type of edge weights. Supported value : float or double.
- Parameters:
graph – [in] The input graph object
weights – [in] device pointer to input vertex weights for weighted overlap, may be NULL for unweighted overlap.
num_pairs – [in] The number of vertex ID pairs specified
first – [in] Device pointer to first vertex ID of each pair
second – [in] Device pointer to second vertex ID of each pair
result – [out] Device pointer to result values, memory needs to be pre-allocated by caller
- template<typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
rmm::device_uvector<weight_t> jaccard_coefficients(raft::handle_t const &handle, graph_view_t<vertex_t, edge_t, false, multi_gpu> const &graph_view, std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view, std::tuple<raft::device_span<vertex_t const>, raft::device_span<vertex_t const>> vertex_pairs, bool do_expensive_check = false)#Compute Jaccard similarity coefficient.
.*
Similarity is computed for every pair of vertices specified. Note that similarity algorithms expect a symmetric graph.
- Throws:
cugraph::logic_error – when an error occurs.
- Template Parameters:
vertex_t – Type of vertex identifiers. Needs to be an integral type.
edge_t – Type of edge identifiers. Needs to be an integral type.
weight_t – Type of edge weights. Needs to be a floating point type.
multi_gpu – Flag indicating whether template instantiation should target single-GPU (false)
- Parameters:
handle – RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and handles to various CUDA libraries) to run graph algorithms.
graph_view – Graph view object.
edge_weight_view – Optional view object holding edge weights for
graph_view
. Ifedge_weight_view.has_value()
== true, use the weights associated with the graph. If false, assume a weight of 1 for all edges.vertex_pairs – tuple of device spans defining the vertex pairs to compute similarity for In a multi-gpu context each vertex pair should be local to this GPU.
do_expensive_check – A flag to run expensive checks for input arguments (if set to
true
).- Returns:
similarity coefficient for the corresponding
vertex_pairs
- template<typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
rmm::device_uvector<weight_t> cosine_similarity_coefficients(raft::handle_t const &handle, graph_view_t<vertex_t, edge_t, false, multi_gpu> const &graph_view, std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view, std::tuple<raft::device_span<vertex_t const>, raft::device_span<vertex_t const>> vertex_pairs, bool do_expensive_check = false)#Compute Cosine similarity coefficient.
.*
Similarity is computed for every pair of vertices specified. Note that similarity algorithms expect a symmetric graph.
- Throws:
cugraph::logic_error – when an error occurs.
- Template Parameters:
vertex_t – Type of vertex identifiers. Needs to be an integral type.
edge_t – Type of edge identifiers. Needs to be an integral type.
weight_t – Type of edge weights. Needs to be a floating point type.
multi_gpu – Flag indicating whether template instantiation should target single-GPU (false)
- Parameters:
handle – RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and handles to various CUDA libraries) to run graph algorithms.
graph_view – Graph view object.
edge_weight_view – Optional view object holding edge weights for
graph_view
. Ifedge_weight_view.has_value()
== true, use the weights associated with the graph. If false, assume a weight of 1 for all edges.vertex_pairs – tuple of device spans defining the vertex pairs to compute similarity for In a multi-gpu context each vertex pair should be local to this GPU.
do_expensive_check – A flag to run expensive checks for input arguments (if set to
true
).- Returns:
similarity coefficient for the corresponding
vertex_pairs
- template<typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
rmm::device_uvector<weight_t> sorensen_coefficients(raft::handle_t const &handle, graph_view_t<vertex_t, edge_t, false, multi_gpu> const &graph_view, std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view, std::tuple<raft::device_span<vertex_t const>, raft::device_span<vertex_t const>> vertex_pairs, bool do_expensive_check = false)#Compute Sorensen similarity coefficient.
.*
Similarity is computed for every pair of vertices specified. Note that similarity algorithms expect a symmetric graph.
- Throws:
cugraph::logic_error – when an error occurs.
- Template Parameters:
vertex_t – Type of vertex identifiers. Needs to be an integral type.
edge_t – Type of edge identifiers. Needs to be an integral type.
weight_t – Type of edge weights. Needs to be a floating point type.
multi_gpu – Flag indicating whether template instantiation should target single-GPU (false)
- Parameters:
handle – RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and handles to various CUDA libraries) to run graph algorithms.
graph_view – Graph view object.
edge_weight_view – Optional view object holding edge weights for
graph_view
. Ifedge_weight_view.has_value()
== true, use the weights associated with the graph. If false, assume a weight of 1 for all edges.vertex_pairs – tuple of device spans defining the vertex pairs to compute similarity for
vertex_pairs – tuple of device spans defining the vertex pairs to compute similarity for In a multi-gpu context each vertex pair should be local to this GPU.
do_expensive_check – A flag to run expensive checks for input arguments (if set to
true
).- Returns:
similarity coefficient for the corresponding
vertex_pairs
- template<typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
rmm::device_uvector<weight_t> overlap_coefficients(raft::handle_t const &handle, graph_view_t<vertex_t, edge_t, false, multi_gpu> const &graph_view, std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view, std::tuple<raft::device_span<vertex_t const>, raft::device_span<vertex_t const>> vertex_pairs, bool do_expensive_check = false)#Compute overlap similarity coefficient.
.*
Similarity is computed for every pair of vertices specified. Note that similarity algorithms expect a symmetric graph.
- Throws:
cugraph::logic_error – when an error occurs.
- Template Parameters:
vertex_t – Type of vertex identifiers. Needs to be an integral type.
edge_t – Type of edge identifiers. Needs to be an integral type.
weight_t – Type of edge weights. Needs to be a floating point type.
multi_gpu – Flag indicating whether template instantiation should target single-GPU (false)
- Parameters:
handle – RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and handles to various CUDA libraries) to run graph algorithms.
graph_view – Graph view object.
edge_weight_view – Optional view object holding edge weights for
graph_view
. Ifedge_weight_view.has_value()
== true, use the weights associated with the graph. If false, assume a weight of 1 for all edges.vertex_pairs – tuple of device spans defining the vertex pairs to compute similarity for
vertex_pairs – tuple of device spans defining the vertex pairs to compute similarity for In a multi-gpu context each vertex pair should be local to this GPU.
do_expensive_check – A flag to run expensive checks for input arguments (if set to
true
).- Returns:
similarity coefficient for the corresponding
vertex_pairs
- template<typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>, rmm::device_uvector<vertex_t>, rmm::device_uvector<weight_t>> jaccard_all_pairs_coefficients(raft::handle_t const &handle, graph_view_t<vertex_t, edge_t, false, multi_gpu> const &graph_view, std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view, std::optional<raft::device_span<vertex_t const>> vertices, std::optional<size_t> topk, bool do_expensive_check = false)#Compute Jaccard all pairs similarity coefficient.
.*
Similarity is computed for all pairs of vertices. Note that in a sparse graph, many of the vertex pairs will have a score of zero. We actually compute similarity only for vertices that are two hop neighbors within the graph, since vertices that are not two hop neighbors will have a score of 0.
If
vertices
is specified we will compute similarity on two hop neighbors thevertices
. Ifvertices
is not specified it will compute similarity on all two hop neighbors in the graph.If
topk
is specified only the toptopk
scoring vertex pairs will be returned, if not specified then scores for all computed vertex pairs will be returned.Note the list of two hop neighbors in the entire graph might be a large number of vertex pairs. If the graph is dense enough it could be as large as the the number of vertices squared, which might run out of memory.
- Throws:
cugraph::logic_error – when an error occurs.
- Template Parameters:
vertex_t – Type of vertex identifiers. Needs to be an integral type.
edge_t – Type of edge identifiers. Needs to be an integral type.
weight_t – Type of edge weights. Needs to be a floating point type.
multi_gpu – Flag indicating whether template instantiation should target single-GPU (false)
- Parameters:
handle – RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and handles to various CUDA libraries) to run graph algorithms.
graph_view – Graph view object.
edge_weight_view – Optional view object holding edge weights for
graph_view
. Ifedge_weight_view.has_value()
== true, use the weights associated with the graph. If false, assume a weight of 1 for all edges.vertices – optional device span defining the seed vertices. In a multi-gpu context the vertices should be local to this GPU.
topk – optional specification of the how many of the top scoring vertex pairs should be returned
do_expensive_check – A flag to run expensive checks for input arguments (if set to
true
).- Returns:
tuple containing three device vectors (v1, v2, score) of the same length. Corresponding elements in the vectors identify a result, v1 identifying a vertex in the graph, v2 identifying one of v1’s two hop neighors, and the score identifying the similarity score between v1 and v2. If
topk
was specified then the vectors will be no longer thantopk
elements. In a multi-gpu context, iftopk
is specified all results will return on GPU rank 0, otherwise they will be returned on the local GPU for vertex v1.
- template<typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>, rmm::device_uvector<vertex_t>, rmm::device_uvector<weight_t>> cosine_similarity_all_pairs_coefficients(raft::handle_t const &handle, graph_view_t<vertex_t, edge_t, false, multi_gpu> const &graph_view, std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view, std::optional<raft::device_span<vertex_t const>> vertices, std::optional<size_t> topk, bool do_expensive_check = false)#Compute Consine all pairs similarity coefficient.
.*
Similarity is computed for all pairs of vertices. Note that in a sparse graph, many of the vertex pairs will have a score of zero. We actually compute similarity only for vertices that are two hop neighbors within the graph, since vertices that are not two hop neighbors will have a score of 0.
If
vertices
is specified we will compute similarity on two hop neighbors thevertices
. Ifvertices
is not specified it will compute similarity on all two hop neighbors in the graph.If
topk
is specified only the toptopk
scoring vertex pairs will be returned, if not specified then scores for all computed vertex pairs will be returned.Note the list of two hop neighbors in the entire graph might be a large number of vertex pairs. If the graph is dense enough it could be as large as the the number of vertices squared, which might run out of memory.
- Throws:
cugraph::logic_error – when an error occurs.
- Template Parameters:
vertex_t – Type of vertex identifiers. Needs to be an integral type.
edge_t – Type of edge identifiers. Needs to be an integral type.
weight_t – Type of edge weights. Needs to be a floating point type.
multi_gpu – Flag indicating whether template instantiation should target single-GPU (false)
- Parameters:
handle – RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and handles to various CUDA libraries) to run graph algorithms.
graph_view – Graph view object.
edge_weight_view – Optional view object holding edge weights for
graph_view
. Ifedge_weight_view.has_value()
== true, use the weights associated with the graph. If false, assume a weight of 1 for all edges.vertices – optional device span defining the seed vertices. In a multi-gpu context the vertices should be local to this GPU.
topk – optional specification of the how many of the top scoring vertex pairs should be returned
do_expensive_check – A flag to run expensive checks for input arguments (if set to
true
).- Returns:
tuple containing three device vectors (v1, v2, score) of the same length. Corresponding elements in the vectors identify a result, v1 identifying a vertex in the graph, v2 identifying one of v1’s two hop neighors, and the score identifying the similarity score between v1 and v2. If
topk
was specified then the vectors will be no longer thantopk
elements. In a multi-gpu context, iftopk
is specified all results will return on GPU rank 0, otherwise they will be returned on the local GPU for vertex v1.
- template<typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>, rmm::device_uvector<vertex_t>, rmm::device_uvector<weight_t>> sorensen_all_pairs_coefficients(raft::handle_t const &handle, graph_view_t<vertex_t, edge_t, false, multi_gpu> const &graph_view, std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view, std::optional<raft::device_span<vertex_t const>> vertices, std::optional<size_t> topk, bool do_expensive_check = false)#Compute Sorensen similarity coefficient.
.*
Similarity is computed for all pairs of vertices. Note that in a sparse graph, many of the vertex pairs will have a score of zero. We actually compute similarity only for vertices that are two hop neighbors within the graph, since vertices that are not two hop neighbors will have a score of 0.
If
vertices
is specified we will compute similarity on two hop neighbors thevertices
. Ifvertices
is not specified it will compute similarity on all two hop neighbors in the graph.If
topk
is specified only the toptopk
scoring vertex pairs will be returned, if not specified then scores for all computed vertex pairs will be returned.Note the list of two hop neighbors in the entire graph might be a large number of vertex pairs. If the graph is dense enough it could be as large as the the number of vertices squared, which might run out of memory.
- Throws:
cugraph::logic_error – when an error occurs.
- Template Parameters:
vertex_t – Type of vertex identifiers. Needs to be an integral type.
edge_t – Type of edge identifiers. Needs to be an integral type.
weight_t – Type of edge weights. Needs to be a floating point type.
multi_gpu – Flag indicating whether template instantiation should target single-GPU (false)
- Parameters:
handle – RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and handles to various CUDA libraries) to run graph algorithms.
graph_view – Graph view object.
edge_weight_view – Optional view object holding edge weights for
graph_view
. Ifedge_weight_view.has_value()
== true, use the weights associated with the graph. If false, assume a weight of 1 for all edges.vertices – optional device span defining the seed vertices.
topk – optional specification of the how many of the top scoring vertex pairs should be returned
do_expensive_check – A flag to run expensive checks for input arguments (if set to
true
).- Returns:
tuple containing three device vectors (v1, v2, score) of the same length. Corresponding elements in the vectors identify a result, v1 identifying a vertex in the graph, v2 identifying one of v1’s two hop neighors, and the score identifying the similarity score between v1 and v2. If
topk
was specified then the vectors will be no longer thantopk
elements. In a multi-gpu context, iftopk
is specified all results will return on GPU rank 0, otherwise they will be returned on the local GPU for vertex v1.
- template<typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>, rmm::device_uvector<vertex_t>, rmm::device_uvector<weight_t>> overlap_all_pairs_coefficients(raft::handle_t const &handle, graph_view_t<vertex_t, edge_t, false, multi_gpu> const &graph_view, std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view, std::optional<raft::device_span<vertex_t const>> vertices, std::optional<size_t> topk, bool do_expensive_check = false)#Compute overlap similarity coefficient.
.*
Similarity is computed for all pairs of vertices. Note that in a sparse graph, many of the vertex pairs will have a score of zero. We actually compute similarity only for vertices that are two hop neighbors within the graph, since vertices that are not two hop neighbors will have a score of 0.
If
vertices
is specified we will compute similarity on two hop neighbors thevertices
. Ifvertices
is not specified it will compute similarity on all two hop neighbors in the graph.If
topk
is specified only the toptopk
scoring vertex pairs will be returned, if not specified then scores for all computed vertex pairs will be returned.Note the list of two hop neighbors in the entire graph might be a large number of vertex pairs. If the graph is dense enough it could be as large as the the number of vertices squared, which might run out of memory.
- Throws:
cugraph::logic_error – when an error occurs.
- Template Parameters:
vertex_t – Type of vertex identifiers. Needs to be an integral type.
edge_t – Type of edge identifiers. Needs to be an integral type.
weight_t – Type of edge weights. Needs to be a floating point type.
multi_gpu – Flag indicating whether template instantiation should target single-GPU (false)
- Parameters:
handle – RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and handles to various CUDA libraries) to run graph algorithms.
graph_view – Graph view object.
edge_weight_view – Optional view object holding edge weights for
graph_view
. Ifedge_weight_view.has_value()
== true, use the weights associated with the graph. If false, assume a weight of 1 for all edges.vertices – optional device span defining the seed vertices.
topk – optional specification of the how many of the top scoring vertex pairs should be returned
do_expensive_check – A flag to run expensive checks for input arguments (if set to
true
).- Returns:
tuple containing three device vectors (v1, v2, score) of the same length. Corresponding elements in the vectors identify a result, v1 identifying a vertex in the graph, v2 identifying one of v1’s two hop neighors, and the score identifying the similarity score between v1 and v2. If
topk
was specified then the vectors will be no longer thantopk
elements. In a multi-gpu context, iftopk
is specified all results will return on GPU rank 0, otherwise they will be returned on the local GPU for vertex v1.