Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

Xu, Keyulu; Zhang, Mozhi; Jegelka, Stefanie; Kawaguchi, Kenji

doi:10.48550/arxiv.2105.04550

Cited by 5 publications

(5 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(iv) Although dense connection on average brings significant accuracy improvements on OGBN-ArXiv with SGC, it sacrifices the training stability and leads to considerable performance variance, as consistently shown by Table A13. (v) Figure 1 reveals that skip connections substantially accelerate the training of deep GNNs, which is aligned with the analysis by concurrent work [71]. Formulations.…”

Section: Skip Connection Motivated By Resnetssupporting

confidence: 79%

Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study

Chen¹,

Zhou²,

Duan³

et al. 2021

Preprint

View full text Add to dashboard Cite

Training deep graph neural networks (GNNs) is notoriously hard. Besides the standard plights in training deep architectures such as vanishing gradients and overfitting, the training of deep GNNs also uniquely suffers from over-smoothing, information squashing, and so on, which limits their potential power on large-scale graphs. Although numerous efforts are proposed to address these limitations, such as various forms of skip connections, graph normalization, and random dropping, it is difficult to disentangle the advantages brought by a deep GNN architecture from those "tricks" necessary to train such an architecture. Moreover, the lack of a standardized benchmark with fair and consistent experimental settings poses an almost insurmountable obstacle to gauging the effectiveness of new mechanisms. In view of those, we present the first fair and reproducible benchmark dedicated to assessing the "tricks" of training deep GNNs. We categorize existing approaches, investigate their hyperparameter sensitivity, and unify the basic configuration. Comprehensive evaluations are then conducted on tens of representative graph datasets including the recent large-scale Open Graph Benchmark (OGB), with diverse deep GNN backbones. Based on synergistic studies, we discover the combo of superior training tricks, that lead us to attain the new state-of-the-art results for deep GCNs, across multiple representative graph datasets. We demonstrate that an organic combo of initial connection, identity mapping, group and batch normalization has the most ideal performance on large datasets. Experiments also reveal a number of "surprises" when combining or scaling up some of the tricks. All codes are available at https://github.com/VITA-Group/Deep_ GCN_Benchmarking.

show abstract

Section: Skip Connection Motivated By Resnetssupporting

confidence: 79%

Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study

Chen¹,

Zhou²,

Duan³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…At the same time, there are many things we do not understand: what is the effect of optimization algorithms and of batching on the connection between complexity and training behavior? Does layer normalization play a critical role and how do these results extend to architectures with strong baked in inductive biases, such as self-attention networks [72,73] and graph neural networks [74,75,66,76,77]? We believe that a firm understanding of these questions will be essential in fleshing out the interplay between training, NN complexity, and generalization.…”

Section: Discussionmentioning

confidence: 99%

“…Training dynamics of NNs. Many authors have studied the training dynamics of NNs [59][60][61][62][63][64][65][66], arguing that, with correct initialization and significant overparameterization, SGD converges to a good solution that generalizes. Our work complements these studies by focusing on how the SGD trajectory can be used to infer NN complexity.…”

Section: Related Workmentioning

confidence: 99%

What training reveals about neural network complexity

Loukas¹,

Poiitis²,

Jegelka³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

This work explores the hypothesis that the complexity of the function a deep neural network (NN) is learning can be deduced by how fast its weights change during training. Our analysis provides evidence for this supposition by relating the network's distribution of Lipschitz constants (i.e., the norm of the gradient at different regions of the input space) during different training intervals with the behavior of the stochastic training procedure. We first observe that the average Lipschitz constant close to the training data affects various aspects of the parameter trajectory, with more complex networks having a longer trajectory, bigger variance, and often veering further from their initialization. We then show that NNs whose biases are trained more steadily have bounded complexity even in regions of the input space that are far from any training point. Finally, we find that steady training with Dropout implies a training-and data-dependent generalization bound that grows poly-logarithmically with the number of parameters. Overall, our results support the hypothesis that good training behavior can be a useful bias towards good generalization.

show abstract

“…Third, graph attention networks are less susceptible to the oversquashing effect [18]. Additionally, we add skip connection in SPGNN in between layers to alleviate over-smoothing issues, following the reasoning in [20].…”

Section: A Airway Labeling Frameworkmentioning

confidence: 99%

“…Second, most approaches adopted shallow GNNs in propagating messages through local (mostly 2-hop) neighbors, causing the structure-awareness to be local. Shallow GNNs are popular because of the over-squashing effect [18] and the over-smoothing issues [19], [20] that occur when the number of layers increases.…”

Section: Introductionmentioning

confidence: 99%

Structure and position-aware graph neural network for airway labeling

Xie¹,

Jacobs²,

Charbonnier³

et al. 2022

Preprint

View full text Add to dashboard Cite

We present a novel graph-based approach for labeling the anatomical branches of a given airway tree segmentation. The proposed method formulates airway labeling as a branch classification problem in the airway tree graph, where branch features are extracted using convolutional neural networks (CNN) and enriched using graph neural networks. Our graph neural network is structure-aware by having each node aggregate information from its local neighbors and position-aware by encoding node positions in the graph.We evaluated the proposed method on 220 airway trees from subjects with various severity stages of Chronic Obstructive Pulmonary Disease (COPD). The results demonstrate that our approach is computationally efficient and significantly improves branch classification performance than the baseline method.The overall average accuracy of our method reaches 91.18% for labeling all 18 segmental airway branches, compared to 83.83% obtained by the standard CNN method. We published our source code at https://github.com/DIAGNijmegen/spgnn. The proposed algorithm is also publicly available at https://grandchallenge.org/algorithms/airway-anatomical-labeling/.

show abstract

Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

Cited by 5 publications

References 0 publications

Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study

Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study

What training reveals about neural network complexity

Structure and position-aware graph neural network for airway labeling

Contact Info

Product

Resources

About