2019
DOI: 10.1109/lca.2019.2923618
|View full text |Cite
|
Sign up to set email alerts
|

Modeling Emerging Memory-Divergent GPU Applications

Abstract: Analytical performance models yield valuable architectural insight without incurring the excessive runtime overheads of simulation. In this work, we study contemporary GPU applications and find that the key performance-related behavior of such applications is distinct from traditional GPU applications. The key issue is that these GPU applications are memory-intensive and have poor spatial locality, which implies that the loads of different threads commonly access different cache blocks. Such memory-divergent a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 14 publications
0
4
0
Order By: Relevance
“…Aimed at providing a deep insight into GPU performance, prior studies [25,50,51] have proposed GPU analytical models based on interval analysis, a well-known approach for accurately modeling CPU performance [14,28]. The key idea of modeling the performance with interval analysis is that a warp scheduler can sustain its maximum issue rate when no stall events occur.…”
Section: Gpu Analytical Modelsmentioning
confidence: 99%
See 2 more Smart Citations
“…Aimed at providing a deep insight into GPU performance, prior studies [25,50,51] have proposed GPU analytical models based on interval analysis, a well-known approach for accurately modeling CPU performance [14,28]. The key idea of modeling the performance with interval analysis is that a warp scheduler can sustain its maximum issue rate when no stall events occur.…”
Section: Gpu Analytical Modelsmentioning
confidence: 99%
“…To quantify the importance of capturing the key core-side stall events, we examine how much impact the enhancements in modern GPU core microarchitectures have on the performance. We also analyze the impact of the enhancements on the modeling accuracy of MDM [50,51], the state-of-the-art GPU analytical model. In this experiment, we configure Accel-Sim cycle-level simulator to simulate the simplified GPU core assumed by MDM (i.e., no sub-cores, 32 lanes per functional unit and 32 L1 D$ banks, and non-sectored L1 D$s).…”
Section: Limitationsmentioning
confidence: 99%
See 1 more Smart Citation
“…While seeding is inherently a memory-bound algorithm, CPU implementations can only issue limited number of parallel memory requests and hence cannot saturate memory bandwidth (only uses 11.5% of peak bandwidth). Current GPUs are not wellsuited because of significant memory divergence during tree traversal (Wang et al, 2019). To make better use of available memory bandwidth, we design a custom seeding accelerator and prototype it on an FPGA.…”
Section: Fpga Prototypementioning
confidence: 99%