pylibcugraphops.pytorch.operators.mha_gat_n2n#
- pylibcugraphops.pytorch.operators.mha_gat_n2n(feat: Union[Tensor, Tuple[Tensor]], attn_weights: Tensor, graph: CSC, num_heads: int = 1, activation: str = 'LeakyReLU', negative_slope: float = 0.2, concat_heads: bool = True, edge_feat: Optional[Tensor] = None, deterministic_dgrad: bool = False, deterministic_wgrad: bool = False, high_precision_dgrad: bool = False, high_precision_wgrad: bool = False) Tensor #
PyTorch autograd function for a multi-head attention layer (GAT-like) without using cudnn (mha_gat) in a node-to-node reduction (n2n).
- Parameters:
- feattorch.Tensor | Tuple[torch.Tensor, torch.Tensor]
The input node features. Depending on whether the graph is bipartite or not, this has to be a tuple of two tensors or a single tensor correspondingly. For bipartite graphs, the rows of each tensors consists of concatenated features from different heads after the linear transformation. Shapes:
(n_src_nodes, dim_in), (n_dst_nodes, dim_in)
, wheredim_in = dim_head * num_heads
, withdim_head
being the feature dimension per head. For non-bipartite graph, the rows of the single tensor consists of concatenated features from different heads after the linear transformation. Shape:(n_src_nodes, dim_in)
, wheredim_in = dim_head * num_heads
, withdim_head
being the feature dimension per head.- attn_weightstorch.Tensor
The attention weights with
dim_w = 2 * dim_in + dim_in_edges
wheredim_in_edges
is assumed to be 0 ifedge_feat
is None. Shape:(dim_w,)
.- graphCSC
The graph used in for the operation.
- num_headsint, default=1
Number of heads in multi-head attention.
- activationstr, default=”LeakyReLU”
The activation function used in the attention mechanism. Choose from
"ELU"
,"LeakyReLU"
,"Linear"
,"ReLU"
,"Scalar"
,"Sigmoid"
and"Tanh"
.- negative_slopefloat, default=0.2
LeakyReLU angle of negative slope. Only effective when
activation
is"LeakyReLU"
.- concat_headsbool, default=True
Aggregated embeddings from each head are concatenated if
True
or averaged ifFalse
.- edge_featOptional[torch.Tensor], default=None
Optional input edge features. Each row consists of concatenated features from different heads after the linear transformation. Shape:
(n_edges, dim_in_edge)
wheredim_in_edge = dim_head_edge * num_heads
, withdim_head_edge
being the feature dimension per head.- deterministic_dgradOptional[bool], default=False
Optional flag indicating whether the feature gradients are computed deterministically using a dedicated workspace buffer.
- deterministic_wgradOptional[bool], default=False
Optional flag indicating whether the weight gradients are computed deterministically using a dedicated workspace buffer.
- high_precision_dgradOptional[bool], default=False
Optional flag indicating whether gradients for inputs in half precision are kept in single precision as long as possible and only casted to the corresponding input type at the very end.
- high_precision_wgradOptional[bool], default=False
Optional flag indicating whether gradients for weights in half precision are kept in single precision as long as possible and only casted to the corresponding input type at the very end.
- Returns:
- outputtorch.Tensor
The aggregation output. Shape for
concat_heads=True
:(n_dst_nodes, dim_in)
; Shape forconcat_heads=False
:(n_dst_nodes, dim_head)
.