The Parallel Debugging Tool (PDT) of the Annai programming environmentis developed within the Joint CSCS-ETH/NEC Collaboration in Parallel Processing [1]. Like the other components of the integrated environment, PDT aims to provide support for application developers to debug portable large-scale data-parallelprograms based on HPF and message-passing programs based on the MPI standard. PDT supports MPI event tracing for race detection and deterministic replay for manually parallelized MPI programs as well as for code generated with the advanced techniques of a data-parallel compiler. This paper describes the tracing and replaying mechanisms included in PDT as well as their efficiency by presenting execution time overheads for several benchmark programs running on the NEC Cenju-2/3 distributed-memory parallel computers.
This paper presents some preliminary results toward the automatic parallelization of uniprocessor FORTRAN code on distributed-memory parallel processors (DMPPs). The paper introduces Oxygen, a compiler for a DMPP under development at the Laboratory. The design of Oxygen and its parallelization strategy are discussed, and an analysis of its most significant components is presented, together with performance benchmarks. Oxygen carries out data consistency analysis at run-time; our results show that the overhead introduced is acceptable. Run-time data consistency analysis may also be the only viable approach to parallelize certain “hard” algorithms, as we will show in this study.
The machine model considered in this paper is that of a distributed memory parallel processor (DMPP) with a two-dimensional torus topology. Within this framework, we study the relationship between the speedup delivered by compiler-parallelized code and the machine's interprocessor communication speed.It is shown that compiler-parallelized code often exhibits more interprocessor communication than manually parallelized code and that the performance of the former is therefore more sensitive to the machine's interprocessor communication speed. Because of this, a parallelizing compiler developed for a platform not explicitly designed to sustain the increased interprocessor communication will produce-in the general casecode that delivers disappointing speedups. Finally, the study provides the point of diminishing return f o r the interprocessor communication speed beyond which the DMPP designer should focus on improving other architectural parameters, such as the local memory-processor bandwidth.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.