Auto-tuning has become increasingly popular for optimizing non-functional parameters of parallel programs. The typically large search space requires sophisticated techniques to find well performing parameter values in a reasonable amount of time. Different parts of a program often perform best with different parameter values. We therefore subdivide programs into several regions, and try to optimize the parameter values for each of those regions separately as opposed to setting the parameter values globally for the entire program. As this enlarges the search space even further, we have to extend existing auto-tuning techniques in order to obtain good results. In this paper we introduce a novel enhancement to the RS-GDE3 algorithm which is used to explore the search space for auto-tuning programs with multiple regions regarding several objectives. We have implemented our auto-tuner using the Insieme compiler and runtime system. In comparison to a non-optimized parallel version of the tested programs, our novel approach achieves up to 7.6, 10.5, and 61.6 fold improvements for three tuned objectives wall time, energy consumption, and resource usage, respectively.
Vienna Fortran is a machine-independent language extension of Fortran, which is based upon the Single-Program-Multiple-Data SPMD paradigm and allows the user to write programs for distributed-memory systems using global addresses. The language features focus mainly on the issue of distributing data across virtual processor structures. In this paper, we discuss those features of Vienna Fortran that allow the data distributions of arrays to change dynamically, depending on runtime conditions. We discuss the relevant language features, outline their implementation and describe how they may be used in applications.
In this paper we give an overview of SCALEA, which is a new performance analysis tool for OpenMP, MPI, HPF, and mixed parallel/distributed programs. SCALEA instruments, executes and measures programs and computes a variety of performance overheads based on a novel overhead classification. Source code and HWprofiling is combined in a single system which significantly extends the scope of possible overheads that can be measured and examined, ranging from HW-counters, such as the number of cache misses or floating point * operations, to more complex performance metrics, such as control or loss of parallelism. Moreover, SCALEA uses a new representation of code regions, called the dynamic code region call graph, which enables detailed overhead analysis for arbitrary code regions.An instrumentation description file is used to relate performance information to code regions of the input program and to reduce instrumentation overhead. Several experiments with realistic codes that cover MPI, OpenMP, HPF, and mixed OpenMP/MPI codes demonstrate the usefulness of SCALEA.
SUMMARYSeveral large real-world applications have been developed for distributed and parallel architectures. We examine two different program development approaches. First, the usage of a high-level programming paradigm which reduces the time to create a parallel program dramatically but sometimes at the cost of a reduced performance; a source-to-source compiler, has been employed to automatically compile programs-written in a high-level programming paradigm-into message passing codes. Second, a manual program development by using a low-level programming paradigm-such as message passing-enables the programmer to fully exploit a given architecture at the cost of a time-consuming and error-prone effort. Performance tools play a central role in supporting the performance-oriented development of applications for distributed and parallel architectures. SCALA-a portable instrumentation, measurement, and post-execution performance analysis system for distributed and parallel programs-has been used to analyze and to guide the application development, by selectively instrumenting and measuring the code versions, by comparing performance information of several program executions, by computing a variety of important performance metrics, by detecting performance bottlenecks, and by relating performance information back to the input program. We show several experiments of SCALA when applied to realworld applications. These experiments are conducted for a NEC Cenju-4 distributed-memory machine and a cluster of heterogeneous workstations and networks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.