2019
DOI: 10.1002/cpe.5547
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system

Abstract: Summary The Roofline performance model provides an intuitive and insightful approach to identifying performance bottlenecks and guiding performance optimization. In preparation for the next‐generation supercomputer Perlmutter at NERSC, this paper presents a methodology to construct a hierarchical Roofline on NVIDIA GPUs and extends it to support reduced precision and Tensor Cores. The hierarchical Roofline incorporates L1, L2, device memory, and system memory bandwidths into one single figure, and it offers mo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
59
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
2

Relationship

3
5

Authors

Journals

citations
Cited by 60 publications
(59 citation statements)
references
References 9 publications
0
59
0
Order By: Relevance
“…Finally, to gain a more detailed understanding of arithmetic intensity and its affects on the performance of the kernel, Figure 7 shows a``roofline"" plot of the performance of the main reconstruction kernel for varying orders of accuracy with and without WENO limiting using 200 \times 200 \times 100 = 4 million cells at 3rd-order temporal accuracy. The Roofline Model [33] is a technique of evaluating how well a particular operation is performing in comparison to the two main system limiters, bandwidth and computation. As the computational intensity of an operation increases, operations will naturally move up the bandwidth line until becoming limited by the FLOPs line.…”
Section: 12mentioning
confidence: 99%
“…Finally, to gain a more detailed understanding of arithmetic intensity and its affects on the performance of the kernel, Figure 7 shows a``roofline"" plot of the performance of the main reconstruction kernel for varying orders of accuracy with and without WENO limiting using 200 \times 200 \times 100 = 4 million cells at 3rd-order temporal accuracy. The Roofline Model [33] is a technique of evaluating how well a particular operation is performing in comparison to the two main system limiters, bandwidth and computation. As the computational intensity of an operation increases, operations will naturally move up the bandwidth line until becoming limited by the FLOPs line.…”
Section: 12mentioning
confidence: 99%
“…Over the years, the Classical Roofline model [36] has been formulated for multicore [19,23] and GPU [18,40] architectures. Moreover, assisted methodologies and automatic tools [5,22,28,29] have been introduced to ease Roofline model generation for scientific and HPC application optimization.…”
Section: Related Workmentioning
confidence: 99%
“…Condensing the optimization space in a single performance figure, this model provides intuitive guidance to optimize complex applications. In this way, the Roofline model has become a confirmed methodology to optimize HPC applications targeting multicore [19,23] and GPU [18,40] architectures. With Field-Programmable Gate Array (FPGA) devices becoming an appealing solutions to accelerate HPC applications, a dual Roofline model for reconfigurable devices is becoming of real interest.…”
Section: Introductionmentioning
confidence: 99%
“…Today, nearly half of all the flops in the Top 500 supercomputers come from GPUs rather than CPUs, and that proportion continues to grow. In preparation for Perlmutter—the National Energy Research Scientific Computing Center's (NERSC's) upcoming NVIDIA GPU‐powered supercomputer—Yang, Kurth, and Williams present a methodology for constructing a hierarchical roofline model for NVIDIA GPUs. The model supports reduced precision and Tensor Cores, and the authors demonstrate its effectiveness in providing an understanding of performance bottlenecks in three proxy applications—GPP from BerkleyGW, HPGMB from AMReX, and conv2d from TensorFlow.…”
Section: Themes Of This Special Issuementioning
confidence: 99%