Rethinking Graph Transformers with Spectral Attention

Kreuzer, Devin; Beaini, Dominique; Hamilton, William L.; Létourneau, Vincent; Tossou, Prudencio

doi:10.48550/arxiv.2106.03893

Cited by 13 publications

(32 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are many attempts of leveraging transformer into the graph domain. Existing methods (Veličković et al, 2017;Kreuzer et al, 2021;Zhang et al, 2020;Ying et al, 2021) have improved the transformer architecture to fit the graph input by improving the attention map or replacing the positional embedding to fit the graph, or both, which are more related to our GSA. Methods like Graph Attention Networks (GAT) (Veličković et al, 2017) and Graph Transformer (GT) constrain the self-attention mechanism to neighboring nodes.…”

Section: Transformer For Graphmentioning

confidence: 99%

“…They surpass GNN baseline methods on the graph representation task. Spectral Attention Network (SAN) (Kreuzer et al, 2021) employs a learned positional encoding (LPE) of Laplacian spectrum to learn the position of nodes in a graph. Graph-BERT (Zhang et al, 2020) uses several types of relative positional encodings to embed the information about the edges within a subgraph.…”

Section: Transformer For Graphmentioning

confidence: 99%

“…However, learning a graph representation with a Transformer model is particularly challenging due to the structural flexibility of a graph, because the Transformer was devised for sequential data, not for a graph. Recently, several algorithms have been proposed to leverage Transformers for learning graph representation Kreuzer et al, 2021;Ying et al, 2021).…”

Section: Introductionmentioning

confidence: 99%

“…For instance, Ying et al (2021) encodes graph information on the attention map by adding bias terms and encodes graph information on the input with centrality encoding. Meanwhile, Dwivedi & Bresson (2020); Kreuzer et al (2021) replace positional encoding into graph Laplacian (Belkin & Niyogi, 2003). We examine possible improvements from previous methods and propose our novel self-attention module for learning graph representation on Transformer.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

GRPE: Relative Positional Encoding for Graph Transformer

Park¹,

Chang²,

Lee³

et al. 2022

Preprint

View full text Add to dashboard Cite

We propose a novel Graph Self-Attention module to enable Transformer models to learn graph representation. We aim to incorporate graph information, on the attention map and hidden representations of Transformer. To this end, we propose context-aware attention which considers the interactions between query, key and graph information. Moreover, we propose graph-embedded value to encode the graph information on the hidden representation. Our extensive experiments and ablation studies validate that our method successfully encodes graph representation on Transformer architecture. Finally, our method achieves state-of-the-art performance on multiple benchmarks of graph representation learning, such as graph classification on images and graph regression on quantum chemistry.

show abstract

Section: Transformer For Graphmentioning

confidence: 99%

Section: Transformer For Graphmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

GRPE: Relative Positional Encoding for Graph Transformer

Park¹,

Chang²,

Lee³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The GNN that are used by state-of-the-art device placement methods mostly follow the message-passing paradigm, which is known to have inherent limitations. For example, the expressiveness of such GNN is bounded by the Weisfeiler-Lehman isomorphism hierarchy [23]. Also, GNNs are known to suffer from over-squashing [24], where there is a distortion of information propagation between distant nodes.…”

Section: A Challenges In Device Placementmentioning

confidence: 99%

Accelerate Model Parallel Training by Using Efficient Graph Traversal Order in Device Placement

Wang¹,

Payberah²,

Hagos³

et al. 2022

Preprint

View full text Add to dashboard Cite

Modern neural networks require long training to reach decent performance on massive datasets. One common approach to speed up training is model parallelization, where large neural networks are split across multiple devices. However, different device placements of the same neural network lead to different training times. Most of the existing device placement solutions treat the problem as sequential decision-making by traversing neural network graphs and assigning their neurons to different devices. This work studies the impact of graph traversal order on device placement. In particular, we empirically study how different graph traversal order leads to different device placement, which in turn affects the training execution time. Our experiment results show that the best graph traversal order depends on the type of neural networks and their computation graphs features. In this work, we also provide recommendations on choosing graph traversal order in device placement for various neural network families to improve the training time in model parallelization.

show abstract