SummaryLarge-scale parallel applications performance is usually far from the expected. Dynamic tuning is a powerful technique that helps to improve the performance of parallel applications. To bring this technique to large-scale computers, this work presents a model that enables decentralized dynamic tuning of large-scale parallel applications. In this model, applications are decomposed into disjoint subsets of tasks that can be tuned individually but also abstracted to obtain a global view of the parallel application. The proposed model has been designed as a hierarchical tuning network of distributed analysis modules and implemented in the form of ELASTIC, an environment for large-scale dynamic tuning. Using ELASTIC an experimental evaluation has been conducted over a synthetic large-scale parallel application and a real agent-based parallel application. The results show that the proposed model, embodied in ELASTIC, is able to scale to meet the demands of dynamic tuning over thousands of processes, while effectively improving the performance of large-scale applications.
KEYWORDSdynamic tuning, performance analysis, performance tools, scalability, tuning network
INTRODUCTIONParallel applications running on supercomputers are able to execute complex scientific parallel applications in a reasonable amount of time. Unfortunately, it is common that the performance expected of these large-scale parallel applications is not easily achieved. Several performance analysis tools, such as Scalasca 1 or TAU, 2 are able to assist developers in identifying the performance problems of these applications in large-scale contexts.However, most of these analysis tools are less useful when applications have execution behaviors that change depending on the input data set or according to data evolution.In this context, performance analysis tools based on automatic and dynamic tuning are necessary. In this approach the three phases of the performance improvement process (monitoring, analysis, and tuning) are performed automatically and continuously while the parallel application is running. However, dynamic tuning of parallel applications is a challenge in a large-scale context. Currently, the tools that offer dynamic tuning 3-6 follow a centralized scheme where a single module is responsible for the global tool control and the analysis and tuning process over the entire parallel application. When working with large-scale parallel applications, a scalability barrier arises from this centralized operation due to the large number of communication connections and the increasing complexity of conducting a holistic performance analysis and tuning.To address the challenge of tuning large-scale parallel applications at runtime, we have defined and designed a model that enables decentralized large-scale dynamic tuning. This is based on decomposing the parallel application in disjoint subsets of tasks that will be analyzed and tuned independently. In addition, an abstraction mechanism is applied on each of these subsets in order to build a sm...