2021
DOI: 10.48550/arxiv.2105.04550
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 0 publications
1
4
0
Order By: Relevance
“…(iv) Although dense connection on average brings significant accuracy improvements on OGBN-ArXiv with SGC, it sacrifices the training stability and leads to considerable performance variance, as consistently shown by Table A13. (v) Figure 1 reveals that skip connections substantially accelerate the training of deep GNNs, which is aligned with the analysis by concurrent work [71]. Formulations.…”
Section: Skip Connection Motivated By Resnetssupporting
confidence: 79%
“…(iv) Although dense connection on average brings significant accuracy improvements on OGBN-ArXiv with SGC, it sacrifices the training stability and leads to considerable performance variance, as consistently shown by Table A13. (v) Figure 1 reveals that skip connections substantially accelerate the training of deep GNNs, which is aligned with the analysis by concurrent work [71]. Formulations.…”
Section: Skip Connection Motivated By Resnetssupporting
confidence: 79%
“…At the same time, there are many things we do not understand: what is the effect of optimization algorithms and of batching on the connection between complexity and training behavior? Does layer normalization play a critical role and how do these results extend to architectures with strong baked in inductive biases, such as self-attention networks [72,73] and graph neural networks [74,75,66,76,77]? We believe that a firm understanding of these questions will be essential in fleshing out the interplay between training, NN complexity, and generalization.…”
Section: Discussionmentioning
confidence: 99%
“…Training dynamics of NNs. Many authors have studied the training dynamics of NNs [59][60][61][62][63][64][65][66], arguing that, with correct initialization and significant overparameterization, SGD converges to a good solution that generalizes. Our work complements these studies by focusing on how the SGD trajectory can be used to infer NN complexity.…”
Section: Related Workmentioning
confidence: 99%
“…Third, graph attention networks are less susceptible to the oversquashing effect [18]. Additionally, we add skip connection in SPGNN in between layers to alleviate over-smoothing issues, following the reasoning in [20].…”
Section: A Airway Labeling Frameworkmentioning
confidence: 99%
“…Second, most approaches adopted shallow GNNs in propagating messages through local (mostly 2-hop) neighbors, causing the structure-awareness to be local. Shallow GNNs are popular because of the over-squashing effect [18] and the over-smoothing issues [19], [20] that occur when the number of layers increases.…”
Section: Introductionmentioning
confidence: 99%