Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Lim, Derek; Höhne, Felix; Li, Xiuyu; Huang, Sijia Linda; Gupta, Vaishnavi; Bhalerao, Omkar; Lim, Ser-Nam

doi:10.48550/arxiv.2110.14446

Cited by 3 publications

(3 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As described in the previous section, the performance of the GCN aggregation kernels can be influenced by the size of a feature vector per vertex and the fraction of non-zero elements in a feature vector (called feature density in this paper). Figure 3 exhibits the feature vector size (X-axis) and the corresponding feature density (Y-axis) of the 32 homogeneous graph datasets used for the prior GCN researches [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40], [41]. As shown in the figure, the feature density of a graph dataset is relatively low as the feature vector size is large.…”

Section: Methodology a Graph Datasetsmentioning

confidence: 99%

Analyzing GCN Aggregation on GPU

Kim

Jeong

et al. 2022

IEEE Access

View full text Add to dashboard Cite

Graph convolutional neural networks (GCNs) are emerging neural networks for graph structures that include large features associated with each vertex. The operations of GCN can be divided into two phases -aggregation and combination. While the combination just performs matrix multiplications using trained weights and aggregated features, the aggregation phase requires graph traversal to collect features from adjacent vertices. Even though neural network applications rely on GPU's massively parallel processing, GCN aggregation kernels exhibit rather low performance since graph processing using compressed graph structures provokes frequent irregular accesses in GPUs. In order to investigate the performance hurdles of GCN aggregation on GPU, we perform an in-depth analysis of the aggregation kernels using real GPU hardware and a cycle-accurate GPU simulator. We first analyze the characteristics of the popular graph datasets used for GCN studies. We reveal the fractions of non-zero elements in feature vectors are diverse among datasets. Based on the observation, we build two types of aggregation kernels that handle uncompressed and compressed feature vectors. Our evaluation exhibits the performance of aggregation can be significantly influenced by kernel design approaches and feature density. We also analyze the individual loads that access the data arrays of the aggregation kernels to specify critical loads. Our analysis reveals the performance of GPU memory hierarchy is influenced by access patterns and feature size of graph datasets. Based on our observations we discuss possible kernel design approaches and architectural ideas that can improve the performance of GCN aggregation.INDEX TERMS GCN, aggregation kernel, GPU, characteristics.

show abstract

Section: Methodology a Graph Datasetsmentioning

confidence: 99%

Analyzing GCN Aggregation on GPU

Kim

Jeong

et al. 2022

IEEE Access

View full text Add to dashboard Cite

show abstract

“…In order to benchmark these works, a number of datasets have been utilized by the various works. A few of the commonly used datasets are BookCorpus [48], WMT 2014 [49], Wikipedia [50], C4 [22], ImageNet [51], and COCO [52].…”

Section: Introductory Workmentioning

confidence: 99%

“…An early work building upon the transformer model was that of Shaw et al [19], which simply involved extending the self-attention mechanism of transformers to efficiently consider representations of the relative positions or distances between sequence In order to benchmark these works, a number of datasets have been utilized by the various works. A few of the commonly used datasets are BookCorpus [48], WMT 2014 [49], Wikipedia [50], C4 [22], ImageNet [51], and COCO [52].…”

Section: Introductory Workmentioning

confidence: 99%

A Historical Survey of Advances in Transformer Architectures

Sajun,

Zualkernan,

Sankalpa

2024

Applied Sciences

View full text Add to dashboard Cite

In recent times, transformer-based deep learning models have risen in prominence in the field of machine learning for a variety of tasks such as computer vision and text generation. Given this increased interest, a historical outlook at the development and rapid progression of transformer-based models becomes imperative in order to gain an understanding of the rise of this key architecture. This paper presents a survey of key works related to the early development and implementation of transformer models in various domains such as generative deep learning and as backbones of large language models. Previous works are classified based on their historical approaches, followed by key works in the domain of text-based applications, image-based applications, and miscellaneous applications. A quantitative and qualitative analysis of the various approaches is presented. Additionally, recent directions of transformer-related research such as those in the biomedical and timeseries domains are discussed. Finally, future research opportunities, especially regarding the multi-modality and optimization of the transformer training process, are identified.

show abstract