pylibcugraphops.pytorch.operators.mha_gat_v2_n2n#

pylibcugraphops.pytorch.operators.mha_gat_v2_n2n(feat: Union[Tensor, Tuple[Tensor]], attn_weights: Tensor, graph: CSC, num_heads: int = 1, activation: str = 'LeakyReLU', negative_slope: float = 0.2, concat_heads: bool = True, edge_feat: Optional[Tensor] = None, deterministic_dgrad: Optional[bool] = False, deterministic_wgrad: Optional[bool] = False) → Tensor#

PyTorch autograd function for a multi-head attention layer (GAT-like) without using cudnn (mha_gat_v2) with an activation prior to the dot product but none afterwards in a node-to-node reduction (n2n).

Parameters:

feattorch.Tensor | Tuple[torch.Tensor, torch.Tensor]: The input node features. Depending on whether the graph is bipartite or not, this has to be a tuple of two tensors or a single tensor correspondingly. For bipartite graphs, the rows of each tensors consists of concatenated features from different heads after the linear transformation. Shapes: (n_src_nodes, dim_in), (n_dst_nodes, dim_in), where dim_in = dim_head * num_heads, with dim_head being the feature dimension per head. For non-bipartite graphs, the rows of the single tensor consists of concatenated features from different heads after the linear transformation. Shape: (n_src_nodes, dim_in), where dim_in = dim_head * num_heads, with dim_head being the feature dimension per head.
attn_weightstorch.Tensor: The attention weight. Shape: (dim_in,).
graphCSC: The graph used in for the operation.
num_headsint, default=1: Number of heads in multi-head attention.
activationstr, default=”LeakyReLU”: The activation function used in the attention mechanism. Choose from "ELU", "LeakyReLU", "Linear", "ReLU", "Scalar", "Sigmoid" and "Tanh".
negative_slopefloat, default=0.2: LeakyReLU angle of negative slope. Only effective when activation is "LeakyReLU".
concat_headsbool, default=True: Aggregated embeddings from each head are concatenated if True or averaged if False.
edge_featOptional[torch.Tensor], default=None: Optional input edge features. Each row consists of concatenated features from different heads after the linear transformation. Shape: (n_edges, dim_in).
deterministic_dgradOptional[bool], default=False: Optional flag indicating whether the gradient w.r.t. the features is computed deterministically using a dedicated workspace buffer.
deterministic_wgradOptional[bool], default=False: Optional flag indicating whether the gradient w.r.t. the weights is computed deterministically using a dedicated workspace buffer.

Returns:

outputtorch.Tensor: The aggregation output. Shape for concat_heads=True: (n_dst_nodes, dim_in); Shape for concat_heads=False: (n_dst_nodes, dim_head).