Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study

Chen, Tianlong; Zhou, Kaixiong; Duan, Keqin; Zheng, Wei‐Jun; Peihao, Wang,; Hu, Xia; Wang, Zhangyang

doi:10.48550/arxiv.2108.10521

Cited by 2 publications

(3 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We have used the basic vanilla-GCN implementation in PyTorch provided by the authors of [1] to incorporate our proposed techniques and show their effectiveness in making traditional GCN comparable/better with SOTA. For our evaluation on Cora, Citeseer, Pubmed, and OGBN-ArXiv, we have closely followed the data split settings and metrics reported by the recent benchmark [49]. See details in Appendix C. For comparison with SOTA models, we have used JKNet [50], InceptionGCN [51], SGC [52], GAT [3], GCNII [24], and DAGNN [53].…”

Section: Dataset and Experimental Setupmentioning

confidence: 99%

“…We use Adam optimizer for our experiments and performed a grid search to tune hyperparameters for our proposed methods and reported our settings in Table 1. For all our experiments, we have trained our modified GCNs for 1500 epochs and 100 independent repetitions following [49] and reported average performances with the standard deviations of the node classification accuracies. All experiments on large graph datasets, e.g., OGBN-ArXiv, are conducted on single 48G Quadro RTX 8000 GPU, while small graph experiments are completed using a single 16G RTX 5000 GPU.…”

Section: Dataset and Experimental Setupmentioning

confidence: 99%

See 1 more Smart Citation

Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again

Jaiswal¹,

Peihao²,

Chen³

et al. 2022

Preprint

View full text Add to dashboard Cite

Despite the enormous success of Graph Convolutional Networks (GCNs) in modeling graph-structured data, most of the current GCNs are shallow due to the notoriously challenging problems of over-smoothening and information squashing along with conventional difficulty caused by vanishing gradients and over-fitting. Previous works have been primarily focused on the study of over-smoothening and over-squashing phenomena in training deep GCNs. Surprisingly, in comparison with CNNs/RNNs, very limited attention has been given to understanding how healthy gradient flow can benefit the trainability of deep GCNs. In this paper, firstly, we provide a new perspective of gradient flow to understand the substandard performance of deep GCNs and hypothesize that by facilitating healthy gradient flow, we can significantly improve their trainability, as well as achieve state-of-the-art (SOTA) level performance from vanilla-GCNs [1]. Next, we argue that blindly adopting the Glorot initialization for GCNs is not optimal, and derive a topologyaware isometric initialization scheme for vanilla-GCNs based on the principles of isometry. Additionally, contrary to ad-hoc addition of skip-connections, we propose to use gradient-guided dynamic rewiring of vanilla-GCNs with skip connections. Our dynamic rewiring method uses the gradient flow within each layer during training to introduce on-demand skip-connections adaptively. We provide extensive empirical evidence across multiple datasets that our methods improve gradient flow in deep vanilla-GCNs and significantly boost their performance to comfortably compete and outperform many fancy state-of-the-art methods. Codes are available at: https://github.com/VITA-Group/GradientGCN.

show abstract

Section: Dataset and Experimental Setupmentioning

confidence: 99%

Section: Dataset and Experimental Setupmentioning

confidence: 99%

Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again

Jaiswal¹,

Peihao²,

Chen³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Considering the node classification task in graph analytics, the vanilla training based on cross-entropy minimization often leads to over-confident prediction on the training data and poor generalization to the testing data [51]. It is also reported that the vanilla training of GNNs is sensitive to overfitting [4,26]. These all point to the optimization problems.…”

Section: Why Gnns Generalize Poorlymentioning

confidence: 99%

CAP: Co-Adversarial Perturbation on Weights and Features for Improving Generalization of Graph Neural Networks

Xue¹,

Zhou²,

Chen³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Despite the recent advances of graph neural networks (GNNs) in modeling graph data, the training of GNNs on large datasets is notoriously hard due to the overfitting. Adversarial training, which augments data with the worst-case adversarial examples, has been widely demonstrated to improve model's robustness against adversarial attacks and generalization ability. However, while the previous adversarial training generally focuses on protecting GNNs from spiteful attacks, it remains unclear how the adversarial training could improve the generalization abilities of GNNs in the graph analytics problem. In this paper, we investigate GNNs from the lens of weight and feature loss landscapes, i.e., the loss changes with respect to model weights and node features, respectively. We draw the conclusion that GNNs are prone to falling into sharp local minima in these two loss landscapes, where GNNs possess poor generalization performances. To tackle this problem, we construct the co-adversarial perturbation (CAP) optimization problem in terms of weights and features, and design the alternating adversarial perturbation algorithm to flatten the weight and feature loss landscapes alternately. Furthermore, we divide the training process into two stages: one conducting the standard cross-entropy minimization to ensure the quick convergence of GNN models, the other applying our alternating adversarial training to avoid falling into locally sharp minima. The extensive experiments demonstrate our CAP can generally improve the generalization performance of GNNs on a variety of benchmark graph datasets.

show abstract

Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study

Cited by 2 publications

References 36 publications

Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again

Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again

CAP: Co-Adversarial Perturbation on Weights and Features for Improving Generalization of Graph Neural Networks

Contact Info

Product

Resources

About