Abstract-We can profile the performance behavior of parallel programs at the level of individual call paths through sampling or direct instrumentation. While we can easily control measurement dilation by adjusting the sampling frequency, the statistical nature of sampling and the difficulty of accessing the parameters of sampled events make it unsuitable for obtaining certain communication metrics, such as the size of message payloads. Alternatively, direct instrumentation, which is preferable for capturing message-passing events, can excessively dilate measurements, particularly for C++ programs, which often have many short but frequently called class member functions. Thus, we combine these techniques in a unified framework that exploits the strengths of each approach while avoiding their weaknesses: We use direct instrumentation to intercept MPI routines while we record the execution of the remaining code through low-overhead sampling. One of the main technical hurdles mastered was the inexpensive and portable determination of call-path information during the invocation of MPI routines. We show that the overhead of our implementation is sufficiently low to support substantial performance improvement of a C++ fluid-dynamics code.
With the growing complexity of supercomputing applications and systems, it is important to constantly develop existing performance measurement and analysis tools to provide new insights into application performance characteristics and thereby help scientists and engineers utilize computing resources more efficiently. We present the various new techniques developed, implemented and integrated into the Scalasca toolset specifically to enhance performance analysis of long-running applications. The first is a hybrid measurement system seamlessly integrating sampled and event-based measurements capable of low-overhead, highly detailed measurements and therefore particularly convenient for initial performance analyses. Then we apply iteration profiling to scientific codes, and present an algorithm for reducing the memory and space requirements of the collected data using iteration profile clustering. Finally, we evaluate the complete integration of all these techniques in a unified measurement system. I. INTRODUCTIONSupercomputers play a key role in countless areas of science and engineering, enabling the development of new insights and technological advances that were previously inconceivable. The strategic importance and ever-growing complexity of the efficient usage of supercomputing resources makes parallel performance analysis tools invaluable for the scientific and engineering community. The Scalasca toolset [1] is a highly scalable, open source profiling and tracing tool supporting measurements of MPI, OpenMP and hybrid MPI/OpenMP applications that has been demonstrated to effectively scale to 294,912 processes [2]. In the course of this thesis project several improvements to the Scalasca toolset were developed, implemented and evaluated to extend its applicability to an even wider range of use cases, and provide advanced features that give more insight into the complex performance phenomena encountered in long-running, large-scale applications. Table I shows the set of representative scientific codes studied, consisting of the SPEC MPI 2007 suite of large applications complemented with the local DROPS and PEPC applications. (PEPC run with 1,024 processes on the Jugene Blue Gene/P, and the others with 256 processes on the Juropa Nehalem cluster.) These applications are written in a variety of languages with varying complexity, particularly in the use of MPI, and run at a range of scales on different HPC systems at Jülich Supercompuing Centre. Some perform thousands of iterations (or time-steps), others only hundreds, and in a couple of cases no clear iteration loop was identifiable (such as the 122.tachyon ray-tracing graphics application).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.