TODD GAMBLIN: Scalable Performance Measurement and Analysis.(Under the direction of Daniel A. Reed.)Concurrency levels in large-scale, distributed-memory supercomputers are rising exponentially. Modern machines may contain 100,000 or more microprocessor cores, and the largest of these, IBM's Blue Gene/L, contains over 200,000 cores. Future systems are expected to support millions of concurrent tasks. In this dissertation, we focus on efficient techniques for measuring and analyzing the performance of applications running on very large parallel machines.Tuning the performance of large-scale applications can be a subtle and time-consuming task because application developers must measure and interpret data from many independent processes. While the volume of the raw data scales linearly with the number of tasks in the running system, the number of tasks is growing exponentially, and data for even small systems quickly becomes unmanageable. Transporting performance data from so many processes over a network can perturb application performance and make measurements inaccurate, and storing such data would require a prohibitive amount of space. Moreover, even if it were stored, analyzing the data would be extremely time-consuming.In this dissertation, we present novel methods for reducing performance data volume. The first draws on multi-scale wavelet techniques from signal processing to compress systemwide, time-varying load-balance data. The second uses statistical sampling to select a small subset of running processes to generate low-volume traces. A third approach combines sampling and wavelet compression to stratify performance data adaptively at run-time and to reduce further the cost of sampled tracing. We have integrated these approaches into Libra, a toolset for scalable load-balance analysis. We present Libra and show how it can be used to analyze data from large scientific applications scalably.iii Without the values that they, along with my grandparents, instilled in me, I would not be the person I am today.Thanks to Dan Reed, my advisor, for sticking with me to the end, even at times when I was unsure whether I would finish. Despite his busy schedule, he was available for advice when I needed it. Even if our typical meetings were short, the advice Dan provided was always excellent, and his well-timed words of encouragement kept me going even when I was on the brink of ditching this whole Ph.D. gig.Thanks to Rob Fowler for his constant advice while I was at RENCI. His extensive input on my papers and on this dissertation has been invaluable. Thanks also to Niki Fowler for her assistance in proofreading my final draft, and to Allan Porterfield for the many useful technical discussions we had at RENCI.
I am grateful to Bronis de Supinski and Martin Schulz at Lawrence Livermore NationalLaboratory for their research insights, constant availability, and for giving me the opportunity to continue working with them after graduation as a postdoctoral scholar. I learn something new every day I work at the la...