pylibcugraphops.pytorch.operators.mha_gat_n2n#
- pylibcugraphops.pytorch.operators.mha_gat_n2n(feat: Union[Tensor, Tuple[Tensor]], attn_weights: Tensor, graph: CSC, num_heads: int = 1, activation: str = 'LeakyReLU', negative_slope: float = 0.2, concat_heads: bool = True, edge_feat: Optional[Tensor] = None) Tensor #
PyTorch autograd function for a multi-head attention layer (GAT-like) without using cudnn (mha_gat) in a node-to-node reduction (n2n).
- Parameters:
- feattorch.Tensor | Tuple[torch.Tensor, torch.Tensor]
The input node features. Depending on whether the graph is bipartite or not, this has to be a tuple of two tensors or a single tensor correspondingly. For bipartite graphs, the rows of each tensors consists of concatenated features from different heads after the linear transformation. Shapes:
(n_src_nodes, dim_in), (n_dst_nodes, dim_in)
, wheredim_in = dim_head * num_heads
, withdim_head
being the feature dimension per head. For non-bipartite graph, the rows of the single tensor consists of concatenated features from different heads after the linear transformation. Shape:(n_src_nodes, dim_in)
, wheredim_in = dim_head * num_heads
, withdim_head
being the feature dimension per head.- attn_weightstorch.Tensor
The attention weights with
dim_w = 2 * dim_in + dim_in_edges
wheredim_in_edges
is assumed to be 0 ifedge_feat
is None. Shape:(dim_w,)
.- graphCSC
The graph used in for the operation.
- num_headsint, default=1
Number of heads in multi-head attention.
- activationstr, default=”LeakyReLU”
The activation function used in the attention mechanism. Choose from
"ELU"
,"LeakyReLU"
,"Linear"
,"ReLU"
,"Scalar"
,"Sigmoid"
and"Tanh"
.- negative_slopefloat, default=0.2
LeakyReLU angle of negative slope. Only effective when
activation
is"LeakyReLU"
.- concat_headsbool, default=True
Aggregated embeddings from each head are concatenated if
True
or averaged ifFalse
.- edge_featOptional[torch.Tensor], default=None
Optional input edge features. Each row consists of concatenated features from different heads after the linear transformation. Shape:
(n_edges, dim_in_edge)
wheredim_in_edge = dim_head_edge * num_heads
, withdim_head_edge
being the feature dimension per head.
- Returns:
- outputtorch.Tensor
The aggregation output. Shape for
concat_heads=True
:(n_dst_nodes, dim_in)
; Shape forconcat_heads=False
:(n_dst_nodes, dim_head)
.