pylibcugraphops.pytorch.operators.mha_gat_v2_n2n#

pylibcugraphops.pytorch.operators.mha_gat_v2_n2n(feat: Union[Tensor, Tuple[Tensor]], attn_weights: Tensor, graph: CSC, num_heads: int = 1, activation: str = 'LeakyReLU', negative_slope: float = 0.2, concat_heads: bool = True, edge_feat: Optional[Tensor] = None, deterministic_dgrad: Optional[bool] = False, deterministic_wgrad: Optional[bool] = False) Tensor#

PyTorch autograd function for a multi-head attention layer (GAT-like) without using cudnn (mha_gat_v2) with an activation prior to the dot product but none afterwards in a node-to-node reduction (n2n).

Parameters:
feattorch.Tensor | Tuple[torch.Tensor, torch.Tensor]

The input node features. Depending on whether the graph is bipartite or not, this has to be a tuple of two tensors or a single tensor correspondingly. For bipartite graphs, the rows of each tensors consists of concatenated features from different heads after the linear transformation. Shapes: (n_src_nodes, dim_in), (n_dst_nodes, dim_in), where dim_in = dim_head * num_heads, with dim_head being the feature dimension per head. For non-bipartite graphs, the rows of the single tensor consists of concatenated features from different heads after the linear transformation. Shape: (n_src_nodes, dim_in), where dim_in = dim_head * num_heads, with dim_head being the feature dimension per head.

attn_weightstorch.Tensor

The attention weight. Shape: (dim_in,).

graphCSC

The graph used in for the operation.

num_headsint, default=1

Number of heads in multi-head attention.

activationstr, default=”LeakyReLU”

The activation function used in the attention mechanism. Choose from "ELU", "LeakyReLU", "Linear", "ReLU", "Scalar", "Sigmoid" and "Tanh".

negative_slopefloat, default=0.2

LeakyReLU angle of negative slope. Only effective when activation is "LeakyReLU".

concat_headsbool, default=True

Aggregated embeddings from each head are concatenated if True or averaged if False.

edge_featOptional[torch.Tensor], default=None

Optional input edge features. Each row consists of concatenated features from different heads after the linear transformation. Shape: (n_edges, dim_in).

deterministic_dgradOptional[bool], default=False

Optional flag indicating whether the gradient w.r.t. the features is computed deterministically using a dedicated workspace buffer.

deterministic_wgradOptional[bool], default=False

Optional flag indicating whether the gradient w.r.t. the weights is computed deterministically using a dedicated workspace buffer.

Returns:
outputtorch.Tensor

The aggregation output. Shape for concat_heads=True: (n_dst_nodes, dim_in); Shape for concat_heads=False: (n_dst_nodes, dim_head).