2021
DOI: 10.48550/arxiv.2111.08202
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learn Locally, Correct Globally: A Distributed Algorithm for Training Graph Neural Networks

Abstract: Despite the recent success of Graph Neural Networks (GNNs), training GNNs on large graphs remains challenging. The limited resource capacities of the existing servers, the dependency between nodes in a graph, and the privacy concern due to the centralized storage and model learning have spurred the need to design an effective distributed algorithm for GNN training. However, existing distributed GNN training methods impose either excessive communication costs or large memory overheads that hinders their scalabi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(6 citation statements)
references
References 17 publications
0
6
0
Order By: Relevance
“…LLCG performs worst particularly for the Reddit dataset, because in the global server correction of LLCG, only a mini-batch is trained and it is not sufficient to correct the plain GCN. This is also the reason why the authors of LLCG report the performance of a complex model with mixing GCN layers and GraphSAGE layers [22]. DGL achieves good performance on some dataset (e.g., OGB-products) with uniform node sampling strategy and real-time embedding exchanging.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…LLCG performs worst particularly for the Reddit dataset, because in the global server correction of LLCG, only a mini-batch is trained and it is not sufficient to correct the plain GCN. This is also the reason why the authors of LLCG report the performance of a complex model with mixing GCN layers and GraphSAGE layers [22]. DGL achieves good performance on some dataset (e.g., OGB-products) with uniform node sampling strategy and real-time embedding exchanging.…”
Section: Resultsmentioning
confidence: 99%
“…"Partition-based" generalizes the existing data parallelism techniques of classical distributed training on i.i.d data to graph data and enjoys minimal communication cost. However, directly partitioning a large graph into multiple subgraphs can result in severe information loss due to the ignorance of huge number of cross-subgraph edges and cause performance degeneration [1,14,22]. For these methods, the embedding of neighbors out of the current subgraph (second embedding set in Eq.…”
Section: Background and Problem Formulationmentioning
confidence: 99%
See 2 more Smart Citations
“…In such a training strategy, a partition is a mini-batch, and we call it a partition-based mini-batch. PSGD-PA [82] is a straightforward implementation of the above idea with a Parameter Server. In GraphTheta [65], the partitions are obtained via a community detection algorithm.…”
Section: Partition-based Mini-batch Generationmentioning
confidence: 99%