Using automated performance modeling to find scalability bugs in complex codes

Calotoiu, Alexandru; Hoefler, Torsten; Poke, Marius; Wolf, Felix

doi:10.1145/2503210.2503277

Cited by 111 publications

(73 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We are working with the community to gradually unify existing techniques and tools including pragma-based source-to-source transformations [41,80], plugin-based GCC and LLVM to expose and tune all internal optimization decisions [30,31]; polyhedral source-to-source transformation tools [12]; differential analysis to detect performance anomalies and CPU/memory bounds [28,36]; just-in-time compilation for Android Dalvik or Oracle JDK; algorithmlevel tuning [3]; techniques to balance communication and computation in numerical codes particularly for heterogeneous architectures [7,75]; Scalasca framework to automate analysis and modeling of scalability of HPC applications [13,40]; LIKWID for lightweight collection of hardware counters [76]; HPCC and HPCG benchmarks to collaboratively rank HPC systems [42,56]; benchmarks from GCC and LLVM, TAU performance tuning framework [68]; and all recent Periscope application tuning plugins [10,60].…”

Section: Discussionmentioning

confidence: 99%

Collective Mind: Towards Practical and Collaborative Auto-Tuning

Fursin

Miceli

Lokhmotov

et al. 2014

Scientific Programming

View full text Add to dashboard Cite

Abstract. Empirical auto-tuning and machine learning techniques have been showing high potential to improve execution time, power consumption, code size, reliability and other important metrics of various applications for more than two decades. However, they are still far from widespread production use due to lack of native support for auto-tuning in an ever changing and complex software and hardware stack, large and multi-dimensional optimization spaces, excessively long exploration times, and lack of unified mechanisms for preserving and sharing of optimization knowledge and research material.We present a possible collaborative approach to solve above problems using Collective Mind knowledge management system. In contrast with previous cTuning framework, this modular infrastructure allows to preserve and share through the Internet the whole auto-tuning setups with all related artifacts and their software and hardware dependencies besides just performance data. It also allows to gradually structure, systematize and describe all available research material including tools, benchmarks, data sets, search strategies and machine learning models. Researchers can take advantage of shared components and data with extensible meta-description to quickly and collaboratively validate and improve existing auto-tuning and benchmarking techniques or prototype new ones. The community can now gradually learn and improve complex behavior of all existing computer systems while exposing behavior anomalies or model mispredictions to an interdisciplinary community in a reproducible way for further analysis. We present several practical, collaborative and model-driven auto-tuning scenarios. We also decided to release all material at c-mind.org/repo to set up an example for a collaborative and reproducible research as well as our new publication model in computer engineering where experimental results are continuously shared and validated by the community.

show abstract

Section: Discussionmentioning

confidence: 99%

Collective Mind: Towards Practical and Collaborative Auto-Tuning

Fursin

Miceli

Lokhmotov

et al. 2014

Scientific Programming

View full text Add to dashboard Cite

show abstract

“…Performance modeling, and automated performance modeling in particular, was shown to be useful and practical for analyzing the performance of parallel applications [9,23,26,31,32]. In this work, we combine multiparameter performance modeling with benchmarking of real task-based applications to automatically generate the empirical efficiency functions of both the application and the contention-free replay of the application's TDG.…”

Section: Modeling Approachmentioning

confidence: 99%

Isoefficiency in Practice

et al. 2017

Self Cite

View full text Add to dashboard Cite

Task-based programming offers an elegant way to express units of computation and the dependencies among them, making it easier to distribute the computational load evenly across multiple cores. However, this separation of problem decomposition and parallelism requires a sufficiently large input problem to achieve satisfactory efficiency on a given number of cores. Unfortunately, finding a good match between input size and core count usually requires significant experimentation, which is expensive and sometimes even impractical. In this paper, we propose an automated empirical method for finding the isoefficiency function of a taskbased program, binding efficiency, core count, and the input size in one analytical expression. This allows the latter two to be adjusted according to given (realistic) efficiency objectives. Moreover, we not only find (i) the actual isoefficiency function but also (ii) the function one would yield if the program execution was free of resource contention and (iii) an upper bound that could only be reached if the program was able to maintain its average parallelism throughout its execution. The difference between the three helps to explain low efficiency, and in particular, it helps to differentiate between resource contention and structural conflicts related to task dependencies or scheduling. The insights gained can be used to co-design programs and shared system resources.

show abstract

“…The work of [6] proposes to blindly fit metrics (essentially time) measured for the main routines of a program when running at low core counts on an existing platform. A large number of fitting functions is tried and the one reporting the best fit is selected as model of that routine.…”

Section: Related Workmentioning

confidence: 99%

“…An effort to investigate the performance of MPI applications at large core counts uses parallel discrete event simulations to run the application in a controlled environment [3]. Most of them demand from specific models or abstractions of the parallel code (and the system) [4], [5] or massive fittings of time-based metrics to predict performance of specific functions [6]. However, there is low information about the insights of the real underlying cause of the inefficiencies, and the knowledge about the influence of different architectural characteristics can be useful to improve the code.…”

Section: Introductionmentioning

confidence: 99%

Scalability prediction for fundamental performance factors

Rosas

Giménez

Labarta

2014

JSFI

View full text Add to dashboard Cite

Inferring the expected performance for parallel applications is getting harder than ever; applications need to be modeled for restricted or nonexistent systems and performance analysts are required to identify and extrapolate their behavior using only the available resources. Prediction models can be based on detailed knowledge of the application algorithms or on blindly trying to extrapolate measurements from existing architectures and codes. This paper describes the work done to define an intermediate methodology where the combination of (a) the essential knowledge about fundamental factors in parallel codes, and (b) detailed analysis of the application behavior at low core counts on current platforms, guides the modeling efforts to estimate behavior at very large core counts. Our methodology integrates the use of several components like instrumentation package, visualization tools, simulators, analytical models and very high level information from the application running on systems in production to build a performance model.

show abstract

Using automated performance modeling to find scalability bugs in complex codes

Cited by 111 publications

References 43 publications

Collective Mind: Towards Practical and Collaborative Auto-Tuning

Collective Mind: Towards Practical and Collaborative Auto-Tuning

Isoefficiency in Practice

Scalability prediction for fundamental performance factors

Contact Info

Product

Resources

About