Accelerating graph sampling for graph machine learning using GPUs

Jangda, Abhinav; Polisetty, Sandeep; Guha, Arjun; Serafini, Marco

doi:10.1145/3447786.3456244

Cited by 51 publications

(18 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We compared two versions of NextDoor that parallelize sampling by sample and by transit, using several sampling algorithms implemented using NextDoor's API. Transit parallelism has shown to be consistently faster, as shown in [19].…”

Section: Systems For Efficient Samplingmentioning

confidence: 90%

“…The computation is irregular and is typically performed using the CPU. In our previous work, we found that graph sampling can take up to 62% of an epoch's time if the host has a single GPU (see Table 2) [19]. This bottleneck is further exacerbated if the CPU is attached to multiple GPUs consuming samples for training.…”

Section: Why Scaling Whole-graph Training Is Difficultmentioning

confidence: 99%

“…Our recent work on NextDoor addresses this problem [19]. NextDoor enables users to express graph sampling tasks using a general, high-level API.…”

Section: Systems For Efficient Samplingmentioning

confidence: 99%

“…Another problem with the aforementioned systems is that they perform sampling using CPUs only. GPU-based sampling has the potential to significantly reduce end-to-end training time, as shown in NextDoor [19]. Distributed sampling systems can leverage GPUs for improved throughput.…”

Section: Systems For Efficient Samplingmentioning

confidence: 99%

See 3 more Smart Citations

Scalable Graph Neural Network Training: The Case for Sampling

Serafini,

Guan

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Graph Neural Networks (GNNs) are a new and increasingly popular family of deep neural network architectures to perform learning on graphs. Training them efficiently is challenging due to the irregular nature of graph data. The problem becomes even more challenging when scaling to large graphs that exceed the capacity of single devices. Standard approaches to distributed DNN training, such as data and model parallelism, do not directly apply to GNNs. Instead, two different approaches have emerged in the literature: whole-graph and sample-based training.In this paper, we review and compare the two approaches. Scalability is challenging with both approaches, but we make a case that research should focus on sample-based training since it is a more promising approach. Finally, we review recent systems supporting sample-based training.

show abstract

Section: Systems For Efficient Samplingmentioning

confidence: 90%

Section: Why Scaling Whole-graph Training Is Difficultmentioning

confidence: 99%

“…Our recent work on NextDoor addresses this problem [19]. NextDoor enables users to express graph sampling tasks using a general, high-level API.…”

Section: Systems For Efficient Samplingmentioning

confidence: 99%

Section: Systems For Efficient Samplingmentioning

confidence: 99%

See 2 more Smart Citations

Scalable Graph Neural Network Training: The Case for Sampling

Serafini,

Guan

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The node sampling specifically extracts a set of subgraphs and the corresponding embeddings from the original (undirected) graph datasets before aggregating and transforming the feature vectors, which can significantly reduce data processing pressures and decrease the computing complexity without an accuracy loss [27,33]. Since the sampled graph should also be self-contained, the subgraphs and embeddings should be reindexed and restructured.…”

Section: Graph Dataset Preprocessingmentioning

confidence: 99%

Hardware/Software Co-Programmable Framework for Computational SSDs to Accelerate Deep Learning Service on Large-Scale Graphs

Kwon¹,

Gouk²,

Lee³

2022

Preprint

View full text Add to dashboard Cite

Graph neural networks (GNNs) process large-scale graphs consisting of a hundred billion edges. In contrast to traditional deep learning, unique behaviors of the emerging GNNs are engaged with a large set of graphs and embedding data on storage, which exhibits complex and irregular preprocessing.We propose a novel deep learning framework on large graphs, HolisticGNN, that provides an easy-to-use, nearstorage inference infrastructure for fast, energy-efficient GNN processing. To achieve the best end-to-end latency and high energy efficiency, HolisticGNN allows users to implement various GNN algorithms and directly executes them where the actual data exist in a holistic manner. It also enables RPC over PCIe such that the users can simply program GNNs through a graph semantic library without any knowledge of the underlying hardware or storage configurations.We fabricate HolisticGNN's hardware RTL and implement its software on an FPGA-based computational SSD (CSSD). Our empirical evaluations show that the inference time of HolisticGNN outperforms GNN inference services using high-performance modern GPUs by 7.1× while reducing energy consumption by 33.2×, on average.

show abstract