pylibcugraphops.pytorch.operators.mha_gat_n2n#

pylibcugraphops.pytorch.operators.mha_gat_n2n(feat: Union[Tensor, Tuple[Tensor]], attn_weights: Tensor, graph: CSC, num_heads: int = 1, activation: str = 'LeakyReLU', negative_slope: float = 0.2, concat_heads: bool = True, edge_feat: Optional[Tensor] = None) Tensor#

PyTorch autograd function for a multi-head attention layer (GAT-like) without using cudnn (mha_gat) in a node-to-node reduction (n2n).

Parameters:
feattorch.Tensor | Tuple[torch.Tensor, torch.Tensor]

The input node features. Depending on whether the graph is bipartite or not, this has to be a tuple of two tensors or a single tensor correspondingly. For bipartite graphs, the rows of each tensors consists of concatenated features from different heads after the linear transformation. Shapes: (n_src_nodes, dim_in), (n_dst_nodes, dim_in), where dim_in = dim_head * num_heads, with dim_head being the feature dimension per head. For non-bipartite graph, the rows of the single tensor consists of concatenated features from different heads after the linear transformation. Shape: (n_src_nodes, dim_in), where dim_in = dim_head * num_heads, with dim_head being the feature dimension per head.

attn_weightstorch.Tensor

The attention weights with dim_w = 2 * dim_in + dim_in_edges where dim_in_edges is assumed to be 0 if edge_feat is None. Shape: (dim_w,).

graphCSC

The graph used in for the operation.

num_headsint, default=1

Number of heads in multi-head attention.

activationstr, default=”LeakyReLU”

The activation function used in the attention mechanism. Choose from "ELU", "LeakyReLU", "Linear", "ReLU", "Scalar", "Sigmoid" and "Tanh".

negative_slopefloat, default=0.2

LeakyReLU angle of negative slope. Only effective when activation is "LeakyReLU".

concat_headsbool, default=True

Aggregated embeddings from each head are concatenated if True or averaged if False.

edge_featOptional[torch.Tensor], default=None

Optional input edge features. Each row consists of concatenated features from different heads after the linear transformation. Shape: (n_edges, dim_in_edge) where dim_in_edge = dim_head_edge * num_heads, with dim_head_edge being the feature dimension per head.

Returns:
outputtorch.Tensor

The aggregation output. Shape for concat_heads=True: (n_dst_nodes, dim_in); Shape for concat_heads=False: (n_dst_nodes, dim_head).