pylibcugraphops.pytorch.operators.mha_gat_v2_n2n#
- pylibcugraphops.pytorch.operators.mha_gat_v2_n2n(feat: Union[Tensor, Tuple[Tensor]], attn_weights: Tensor, graph: CSC, num_heads: int = 1, activation: str = 'LeakyReLU', negative_slope: float = 0.2, concat_heads: bool = True, edge_feat: Optional[Tensor] = None, deterministic_dgrad: Optional[bool] = False, deterministic_wgrad: Optional[bool] = False) Tensor #
PyTorch autograd function for a multi-head attention layer (GAT-like) without using cudnn (mha_gat_v2) with an activation prior to the dot product but none afterwards in a node-to-node reduction (n2n).
- Parameters:
- feattorch.Tensor | Tuple[torch.Tensor, torch.Tensor]
The input node features. Depending on whether the graph is bipartite or not, this has to be a tuple of two tensors or a single tensor correspondingly. For bipartite graphs, the rows of each tensors consists of concatenated features from different heads after the linear transformation. Shapes:
(n_src_nodes, dim_in), (n_dst_nodes, dim_in)
, wheredim_in = dim_head * num_heads
, withdim_head
being the feature dimension per head. For non-bipartite graphs, the rows of the single tensor consists of concatenated features from different heads after the linear transformation. Shape:(n_src_nodes, dim_in)
, wheredim_in = dim_head * num_heads
, withdim_head
being the feature dimension per head.- attn_weightstorch.Tensor
The attention weight. Shape:
(dim_in,)
.- graphCSC
The graph used in for the operation.
- num_headsint, default=1
Number of heads in multi-head attention.
- activationstr, default=”LeakyReLU”
The activation function used in the attention mechanism. Choose from
"ELU"
,"LeakyReLU"
,"Linear"
,"ReLU"
,"Scalar"
,"Sigmoid"
and"Tanh"
.- negative_slopefloat, default=0.2
LeakyReLU angle of negative slope. Only effective when
activation
is"LeakyReLU"
.- concat_headsbool, default=True
Aggregated embeddings from each head are concatenated if
True
or averaged ifFalse
.- edge_featOptional[torch.Tensor], default=None
Optional input edge features. Each row consists of concatenated features from different heads after the linear transformation. Shape:
(n_edges, dim_in)
.- deterministic_dgradOptional[bool], default=False
Optional flag indicating whether the gradient w.r.t. the features is computed deterministically using a dedicated workspace buffer.
- deterministic_wgradOptional[bool], default=False
Optional flag indicating whether the gradient w.r.t. the weights is computed deterministically using a dedicated workspace buffer.
- Returns:
- outputtorch.Tensor
The aggregation output. Shape for
concat_heads=True
:(n_dst_nodes, dim_in)
; Shape forconcat_heads=False
:(n_dst_nodes, dim_head)
.