pylibcugraphops.pytorch.operators.mha_gat_n2n#

pylibcugraphops.pytorch.operators.mha_gat_n2n(feat: Union[Tensor, Tuple[Tensor]], attn_weights: Tensor, graph: CSC, num_heads: int = 1, activation: str = 'LeakyReLU', negative_slope: float = 0.2, concat_heads: bool = True, edge_feat: Optional[Tensor] = None, deterministic_dgrad: bool = False, deterministic_wgrad: bool = False, high_precision_dgrad: bool = False, high_precision_wgrad: bool = False) Tensor#

PyTorch autograd function for a multi-head attention layer (GAT-like) without using cudnn (mha_gat) in a node-to-node reduction (n2n).

Parameters:
feattorch.Tensor | Tuple[torch.Tensor, torch.Tensor]

The input node features. Depending on whether the graph is bipartite or not, this has to be a tuple of two tensors or a single tensor correspondingly. For bipartite graphs, the rows of each tensors consists of concatenated features from different heads after the linear transformation. Shapes: (n_src_nodes, dim_in), (n_dst_nodes, dim_in), where dim_in = dim_head * num_heads, with dim_head being the feature dimension per head. For non-bipartite graph, the rows of the single tensor consists of concatenated features from different heads after the linear transformation. Shape: (n_src_nodes, dim_in), where dim_in = dim_head * num_heads, with dim_head being the feature dimension per head.

attn_weightstorch.Tensor

The attention weights with dim_w = 2 * dim_in + dim_in_edges where dim_in_edges is assumed to be 0 if edge_feat is None. Shape: (dim_w,).

graphCSC

The graph used in for the operation.

num_headsint, default=1

Number of heads in multi-head attention.

activationstr, default=”LeakyReLU”

The activation function used in the attention mechanism. Choose from "ELU", "LeakyReLU", "Linear", "ReLU", "Scalar", "Sigmoid" and "Tanh".

negative_slopefloat, default=0.2

LeakyReLU angle of negative slope. Only effective when activation is "LeakyReLU".

concat_headsbool, default=True

Aggregated embeddings from each head are concatenated if True or averaged if False.

edge_featOptional[torch.Tensor], default=None

Optional input edge features. Each row consists of concatenated features from different heads after the linear transformation. Shape: (n_edges, dim_in_edge) where dim_in_edge = dim_head_edge * num_heads, with dim_head_edge being the feature dimension per head.

deterministic_dgradOptional[bool], default=False

Optional flag indicating whether the feature gradients are computed deterministically using a dedicated workspace buffer.

deterministic_wgradOptional[bool], default=False

Optional flag indicating whether the weight gradients are computed deterministically using a dedicated workspace buffer.

high_precision_dgradOptional[bool], default=False

Optional flag indicating whether gradients for inputs in half precision are kept in single precision as long as possible and only casted to the corresponding input type at the very end.

high_precision_wgradOptional[bool], default=False

Optional flag indicating whether gradients for weights in half precision are kept in single precision as long as possible and only casted to the corresponding input type at the very end.

Returns:
outputtorch.Tensor

The aggregation output. Shape for concat_heads=True: (n_dst_nodes, dim_in); Shape for concat_heads=False: (n_dst_nodes, dim_head).