Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis 2013
DOI: 10.1145/2503210.2503277
|View full text |Cite
|
Sign up to set email alerts
|

Using automated performance modeling to find scalability bugs in complex codes

Abstract: Many parallel applications suffer from latent performance limitations that may prevent them from scaling to larger machine sizes. Often, such scalability bugs manifest themselves only when an attempt to scale the code is actually being made-a point where remediation can be difficult. However, creating analytical performance models that would allow such issues to be pinpointed earlier is so laborious that application developers attempt it at most for a few selected kernels, running the risk of missing harmful b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
73
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
6
2
1

Relationship

3
6

Authors

Journals

citations
Cited by 111 publications
(73 citation statements)
references
References 43 publications
0
73
0
Order By: Relevance
“…We are working with the community to gradually unify existing techniques and tools including pragma-based source-to-source transformations [41,80], plugin-based GCC and LLVM to expose and tune all internal optimization decisions [30,31]; polyhedral source-to-source transformation tools [12]; differential analysis to detect performance anomalies and CPU/memory bounds [28,36]; just-in-time compilation for Android Dalvik or Oracle JDK; algorithmlevel tuning [3]; techniques to balance communication and computation in numerical codes particularly for heterogeneous architectures [7,75]; Scalasca framework to automate analysis and modeling of scalability of HPC applications [13,40]; LIKWID for lightweight collection of hardware counters [76]; HPCC and HPCG benchmarks to collaboratively rank HPC systems [42,56]; benchmarks from GCC and LLVM, TAU performance tuning framework [68]; and all recent Periscope application tuning plugins [10,60].…”
Section: Discussionmentioning
confidence: 99%
“…We are working with the community to gradually unify existing techniques and tools including pragma-based source-to-source transformations [41,80], plugin-based GCC and LLVM to expose and tune all internal optimization decisions [30,31]; polyhedral source-to-source transformation tools [12]; differential analysis to detect performance anomalies and CPU/memory bounds [28,36]; just-in-time compilation for Android Dalvik or Oracle JDK; algorithmlevel tuning [3]; techniques to balance communication and computation in numerical codes particularly for heterogeneous architectures [7,75]; Scalasca framework to automate analysis and modeling of scalability of HPC applications [13,40]; LIKWID for lightweight collection of hardware counters [76]; HPCC and HPCG benchmarks to collaboratively rank HPC systems [42,56]; benchmarks from GCC and LLVM, TAU performance tuning framework [68]; and all recent Periscope application tuning plugins [10,60].…”
Section: Discussionmentioning
confidence: 99%
“…Performance modeling, and automated performance modeling in particular, was shown to be useful and practical for analyzing the performance of parallel applications [9,23,26,31,32]. In this work, we combine multiparameter performance modeling with benchmarking of real task-based applications to automatically generate the empirical efficiency functions of both the application and the contention-free replay of the application's TDG.…”
Section: Modeling Approachmentioning
confidence: 99%
“…The work of [6] proposes to blindly fit metrics (essentially time) measured for the main routines of a program when running at low core counts on an existing platform. A large number of fitting functions is tried and the one reporting the best fit is selected as model of that routine.…”
Section: Related Workmentioning
confidence: 99%
“…An effort to investigate the performance of MPI applications at large core counts uses parallel discrete event simulations to run the application in a controlled environment [3]. Most of them demand from specific models or abstractions of the parallel code (and the system) [4], [5] or massive fittings of time-based metrics to predict performance of specific functions [6]. However, there is low information about the insights of the real underlying cause of the inefficiencies, and the knowledge about the influence of different architectural characteristics can be useful to improve the code.…”
Section: Introductionmentioning
confidence: 99%