Abstract:In this paper, we take a concrete step towards materializing our long-term goal of providing a fully automatic end-to-end tuning infrastructure for arbitrary program components and full applications. We describe a general-purpose offline auto-tuning framework and apply it to an application benchmark, SMG2000, a semi-coarsening multigrid on structured grids. We show that the proposed system first extracts computationally-intensive loop nests into separate executable functions, a code transformation called outli… Show more
“…For example, Process 7 will send to and receive from processes 2, 3,4,6,8,10,11,12,18,19,20,21,22,23,24,26,27, and 28.…”
Section: B Pattern Detectionmentioning
confidence: 99%
“…At a high-level, SMG2000 performs three distinct phases to solve the problem as reported in [24]. These phases are Initialization, Setup and Solve.…”
Understanding the behaviour of High Performance Computing (HPC) systems is a challenging task due to the large number of processes they involve as well as the complex interactions among these processes. In this paper, we present a novel approach that aims to simplify the analysis of large execution traces generated from HPC applications. We achieve this through a technique that allows semi-automatic extraction of execution phases from large traces. These phases, which characterize the main computations of the traced scenario, can be used by software engineers to browse the content of a trace at different levels of abstraction. Our approach is based on the application of information theory principles to the analysis of sequences of communication patterns found in HPC traces. The results of the proposed approach when applied to traces of a large HPC industrial system demonstrate its effectiveness in identifying the main program phases and their corresponding sub-phases.
“…For example, Process 7 will send to and receive from processes 2, 3,4,6,8,10,11,12,18,19,20,21,22,23,24,26,27, and 28.…”
Section: B Pattern Detectionmentioning
confidence: 99%
“…At a high-level, SMG2000 performs three distinct phases to solve the problem as reported in [24]. These phases are Initialization, Setup and Solve.…”
Understanding the behaviour of High Performance Computing (HPC) systems is a challenging task due to the large number of processes they involve as well as the complex interactions among these processes. In this paper, we present a novel approach that aims to simplify the analysis of large execution traces generated from HPC applications. We achieve this through a technique that allows semi-automatic extraction of execution phases from large traces. These phases, which characterize the main computations of the traced scenario, can be used by software engineers to browse the content of a trace at different levels of abstraction. Our approach is based on the application of information theory principles to the analysis of sequences of communication patterns found in HPC traces. The results of the proposed approach when applied to traces of a large HPC industrial system demonstrate its effectiveness in identifying the main program phases and their corresponding sub-phases.
“…Compiler-based approaches are mostly independent from specific applications. In [18], whole applications are tuned by the compiler by extracting the most expensive loops from the source code and applying transformations such as tiling, unrolling, or permutation. The approach presented in [19] uses compiled binaries of applications to derive models of their execution times.…”
The Fast Multipole Method (FMM) is an efficient, widely used method for the solution of N-body problems. One of the main data structures is a hierarchical tree data structure describing the separation into near-field and farfield particle interactions. This article presents a method for automatic tuning of the FMM by selecting the optimal FMM tree depth based on an integrated performance prediction of the FMM computations. The prediction method exploits benchmarking of significant parts of the FMM implementation to adapt the tuning to the specific hardware system being used. Furthermore, a separate analysis phase at runtime is used to predict the computational load caused by the specific particle system to be computed. The tuning method was integrated into an FMM implementation. Performance results show that a reliable determination of the tree depth is achieved, thus leading to minimal execution times of the FMM algorithm.
“…In contrast to common compiler optimizations, autotuning approaches often take into account details about the specific application being optimized and the environment where it will execute. Such approaches have been applied to specific types of software (e.g., computer algebra libraries [51] and high performance computing [39,49] as well as for general purpose languages and platforms (e.g., [45]). …”
Reducing the energy usage of software is becoming more important in many environments, in particular, batterypowered mobile devices, embedded systems and data centers. Recent empirical studies indicate that software engineers can support the goal of reducing energy usage by making design and implementation decisions in ways that take into consideration how such decisions impact the energy usage of an application. However, the large number of possible choices and the lack of feedback and information available to software engineers necessitates some form of automated decision-making support. This paper describes the first known automated support for systematically optimizing the energy usage of applications by making code-level changes. It is e↵ective at reducing energy usage while freeing developers from needing to deal with the low-level, tedious tasks of applying changes and monitoring the resulting impacts to the energy usage of their application. We present a general framework, SEEDS, as well as an instantiation of the framework that automatically optimizes Java applications by selecting the most energye cient library implementations for Java's Collections API. Our empirical evaluation of the framework and instantiation show that it is possible to improve the energy usage of an application in a fully automated manner for a reasonable cost.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.