Efficient Data Loader for Fast Sampling-Based GNN Training on Large Graphs

Bai, Youhui; Li, Cheng; Lin, Zhi-Qi; Wu, Yufei; Miao, Youshan; Liu, Yunxin; Xu, Yinlong

doi:10.1109/tpds.2021.3065737

Cited by 26 publications

(13 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we review recent methods for improving the efficiency of GNNs, which is regarded as aligning GNN research with social values regarding environmental well-being. Generally, the efficiency improvement is evaluated with reference to time-related metrics (e.g., response latency or speedup rating [257], [258], throughput rating [259], [260], communication time [261]), energy-related metrics (e.g., nodes-per-Joule [36], energy consumption [262]), or resource-related metrics (e.g., memory footprint [72], cache access performance, and peak memory usage [263]). Existing methods include scalable GNN architectures and efficient data communication, model compression methods, efficient frameworks and accelerators.…”

Section: Environmental Well-being Of Gnnsmentioning

confidence: 99%

See 1 more Smart Citation

Trustworthy Graph Neural Networks: Aspects, Methods and Trends

Zhang¹,

Wu²,

Yuan³

et al. 2022

Preprint

View full text Add to dashboard Cite

Graph neural networks (GNNs) have emerged as a series of competent graph learning methods for diverse real-world scenarios, ranging from daily applications like recommendation systems and question answering to cutting-edge technologies such as drug discovery in life sciences and n-body simulation in astrophysics. However, task performance is not the only requirement for GNNs. Performance-oriented GNNs have exhibited potential adverse effects like vulnerability to adversarial attacks, unexplainable discrimination against disadvantaged groups, or excessive resource consumption in edge computing environments. To avoid these unintentional harms, it is necessary to build competent GNNs characterised by trustworthiness. To this end, we propose a comprehensive roadmap to build trustworthy GNNs from the view of the various computing technologies involved. In this survey, we introduce basic concepts and comprehensively summarise existing efforts for trustworthy GNNs from six aspects, including robustness, explainability, privacy, fairness, accountability, and environmental well-being. Additionally, we highlight the intricate cross-aspect relations between the above six aspects of trustworthy GNNs. Finally, we present a thorough overview of trending directions for facilitating the research and industrialisation of trustworthy GNNs.

show abstract

Section: Environmental Well-being Of Gnnsmentioning

confidence: 99%

“…Moreover, some GNNs suffer from inefficient data loading. For example, data loading occupies 74% of the whole training time for GCN [260]. A method called PaGraph [260] analyses pipeline bottlenecks of GNNs and proposes a GPU cache policy to reduce the time consumption associated with moving data from CPU to GPU.…”

Section: Environmental Well-being Of Gnnsmentioning

confidence: 99%

Trustworthy Graph Neural Networks: Aspects, Methods and Trends

Zhang¹,

Wu²,

Yuan³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Ali-Graph [14] used a novel storage layer to cache the nodes and their intermediate result to reduce communication between local and other processors. Similarly, Bai et al [26] presented an efficient data loader to store frequently accessed nodes in cache by using a novel indexing algorithm which results in speeding up the acquisition of information between processors to reduce communication time. Jiang et al [16] provided different sampling probability for nodes on current processor and other processors.…”

Section: Optimized Distributed Graph Representation Learningmentioning

confidence: 99%

Distributed Optimization of Graph Convolutional Network using Subgraph Variance

Zhao¹,

Song²,

Li³

et al. 2021

Preprint

View full text Add to dashboard Cite

In recent years, Graph Convolutional Networks (GCNs) have achieved great success in learning from graph-structured data. With the growing tendency of graph nodes and edges, GCN training by single processor cannot meet the demand for time and memory, which led to a boom into distributed GCN training frameworks research. However, existing distributed GCN training frameworks require enormous communication costs between processors since multitudes of dependent nodes and edges information need to be collected and transmitted for GCN training from other processors. To address this issue, we propose a Graph Augmentation based Distributed GCN framework (GAD). In particular, GAD has two main components, GAD-Partition and GAD-Optimizer. We first propose a graph augmentation-based partition (GAD-Partition) that can divide original graph into augmented subgraphs to reduce communication by selecting and storing as few significant nodes of other processors as possible while guaranteeing the accuracy of the training. In addition, we further design a subgraph variance-based importance calculation formula and propose a novel weighted global consensus method, collectively referred to as GAD-Optimizer. This optimizer adaptively reduces the importance of subgraphs with large variances for the purpose of reducing the effect of extra variance introduced by GAD-Partition on distributed GCN training. Extensive experiments on four large-scale real-world datasets demonstrate that our framework significantly reduces the communication overhead (≈ 50%), improves the convergence speed (≈ 2X) of distributed GCN training, and slight gain in accuracy (≈ 0.45%) based on minimal redundancy compared to the state-of-the-art methods.

show abstract

“…For example, GraphSAGE [7] computes the 𝑀𝑎𝑥 of the neighboring nodes, while some other models use 𝑆𝑢𝑚 [29]. The aggregation result is given to a linear function (𝐿𝑖𝑛𝑒𝑎𝑟 ) and an activation function (𝑅𝑒𝐿𝑈 ) to obtain the intermediate embedding y (1) 𝑖 . The intermediate embeddings are further aggregated for a few layers to obtain the output embeddings.…”

Section: Introductionmentioning

confidence: 99%

“…The idea is to sample a subset of neighbors and estimate the aggregation results based on the sampled nodes. As shown in Figure 2c, instead of computing the accurate value of x (1) 1 with all of x (0)…”

Section: Introductionmentioning

confidence: 99%

Rethinking graph data placement for graph neural network training on multiple GPUs

Song

2022

Proceedings of the 36th ACM International Conference on Supercomputing

View full text Add to dashboard Cite

Graph partitioning is commonly used for dividing graph data for parallel processing. While they achieve good performance for the traditional graph processing algorithms, the existing graph partitioning methods are unsatisfactory for data-parallel GNN training on GPUs. In this work, we rethink the graph data placement problem for large-scale GNN training on multiple GPUs. We find that loading input features is a performance bottleneck for GNN training on large graphs that cannot be stored on GPU. To reduce the data loading overhead, we first propose a performance model of data movement among CPU and GPUs in GNN training. Then, based on the performance model, we provide an efficient algorithm to divide and distribute the graph data onto multiple GPUs so that the data loading time is minimized. For cases where data placement alone cannot achieve good performance, we propose a locality-aware neighbor sampling technique to further reduce the data movement overhead without losing accuracy. Our experiments with graphs of different sizes on different numbers of GPUs show that our techniques not only achieve smaller data loading time but also incur much less preprocessing overhead than the existing graph partitioning methods. CCS Concepts:• Computing methodologies → Parallel algorithms; • Software and its engineering → Distributed memory.

show abstract

Efficient Data Loader for Fast Sampling-Based GNN Training on Large Graphs

Cited by 26 publications

References 23 publications

Trustworthy Graph Neural Networks: Aspects, Methods and Trends

Trustworthy Graph Neural Networks: Aspects, Methods and Trends

Distributed Optimization of Graph Convolutional Network using Subgraph Variance

Rethinking graph data placement for graph neural network training on multiple GPUs

Contact Info

Product

Resources

About