Improved Partitioning Graph Embedding Framework for Small Cluster

Sun, Ding; Huang, Zhen; Li, Dongsheng; Ye, Xiangyu; Wang, Yilin

doi:10.1007/978-3-030-82136-4_17

Cited by 2 publications

(2 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, all works that scale training beyond CPU memory utilize some form of graph partitioning [22]. Sun et al [35] utilize partition recombination to improve shallow model quality in comparison to static partitions used by PyTorch BigGraph. This method is similar to our two-level partition abstraction, however we extend support to GNNs and analyze the effect of two-level policies on training time and accuracy.…”

Section: Related Workmentioning

confidence: 99%

“…Replacement Policies for Disk-based Graph Learning To scale training beyond CPU memory, Marius++ supports disk-based training for GNNs. Disk-based training requires that the graph is split into multiple node partitions [22,29,35]. Across training iterations, a subset of partitions is transferred to CPU memory and mixed CPU-GPU training is performed on training data obtained by the induced subgraph.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks

Waleffe¹,

Mohoney²,

Ρεκατσίνας³

et al. 2022

Preprint

View full text Add to dashboard Cite

Graph Neural Networks (GNNs) have emerged as a powerful model for ML over graph-structured data. Yet, scalability remains a major challenge for using GNNs over billion-edge inputs. The creation of mini-batches used for training incurs computational and data movement costs that grow exponentially with the number of GNN layers as state-of-the-art models aggregate information from the multi-hop neighborhood of each input node. In this paper, we focus on scalable training of GNNs with emphasis on resource efficiency. We show that out-of-core pipelined mini-batch training in a single machine outperforms resource-hungry multi-GPU solutions. We introduce Marius++, a system for training GNNs over billion-scale graphs. Marius++ provides disk-optimized training for GNNs and introduces a series of data organization and algorithmic contributions that 1) minimize the memory-footprint and end-to-end time required for training and 2) ensure that models learned with diskbased training exhibit accuracy similar to those fully trained in mixed CPU/GPU settings. We evaluate Marius++ against PyTorch Geometric and Deep Graph Library using seven benchmark (model, data set) settings and find that Marius++ with one GPU can achieve the same level of model accuracy up to 8× faster than these systems when they are using up to eight GPUs. For these experiments, disk-based training allows Marius++ deployments to be up to 64× cheaper in monetary cost than those of the competing systems.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%