Development and performance analysis of real‐world applications for distributed and parallel architectures

Concurrency and Computation

Sowa-Pieklo

Czerwinski

et al. 2002

Self Cite

SUMMARYDebuggers play an important role in developing parallel applications. They are used to control the state of many processes, to present distributed information in a concise and clear way, to observe the execution behavior, and to detect and locate programming errors. More sophisticated debugging systems also try to improve understanding of global execution behavior and intricate details of a program. In this paper we describe the design and implementation of SPiDER, which is an interactive source-level debugging system for both regular and irregular High-Performance Fortran (HPF) programs. SPiDER combines a base debugging system for message-passing programs with a high-level debugger that interfaces with an HPF compiler. SPiDER, in addition to conventional debugging functionality, allows a single process of a parallel program to be expected or the entire program to be examined from a global point of view. A sophisticated visualization system has been developed and included in SPiDER to visualize data distributions, data-toprocessor mapping relationships, and array values. SPiDER enables a programmer to dynamically change data distributions as well as array values. For arrays whose distribution can change during program execution, an animated replay displays the distribution sequence together with the associated source code location. Array values can be stored at individual execution points and compared against each other to examine execution behavior (e.g. convergence behavior of a numerical algorithm). Finally, SPiDER also offers limited support to evaluate the performance of parallel programs through a graphical load diagram. SPiDER has been fully implemented and is currently being used for the development of various real-world applications. Several experiments are presented that demonstrate the usefulness of SPiDER.

Section: Collecting Data For Redistribution Historymentioning

confidence: 99%

Section: Pricing Of Financial Derivativesmentioning

confidence: 99%

SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

Concurrency and Computation

Sowa-Pieklo

Czerwinski

et al. 2002

Self Cite

“…P 3 T + models communication overhead, work distribution, computation times, and cache misses which is important for both distributed and parallel programs. P 3 T + invokes a single profile run of the original sequential input program -ignoring all explicit parallel language constructs such as HPF directives -by using SCALA [21] in order to determine execution frequencies and branching probabilities. In order to achieve high estimation accuracy, we aggressively exploit compiler analysis and optimization information.…”

Section: Introductionmentioning

confidence: 99%

P³T+: A Performance Estimator for Distributed and Parallel Programs

v{z}gaj

2000

Scientific Programming

Self Cite

Developing distributed and parallel programs on today's multiprocessor architectures is still a challenging task. Particular distressing is the lack of effective performance tools that support the programmer in evaluating changes in code, problem and machine sizes, and target architectures. In this paper we introduce P 3 T + which is a performance estimator for mostly regular HPF (High Performance Fortran) programs but partially covers also message passing programs (MPI). P 3 T + is unique by modeling programs, compiler code transformations, and parallel and distributed architectures. It computes at compile-time a variety of performance parameters including work distribution, number of transfers, amount of data transferred, transfer times, computation times, and number of cache misses. Several novel technologies are employed to compute these parameters: loop iteration spaces, array access patterns, and data distributions are modeled by employing highly effective symbolic analysis. Communication is estimated by simulating the behavior of a communication library used by the underlying compiler. Computation times are predicted through pre-measured kernels on every target architecture of interest. We carefully model most critical architecture specific factors such as cache lines sizes, number of cache lines available, startup times, message transfer time per byte, etc. P 3 T + has been implemented and is closely integrated with the Vienna High Performance Compiler (VFC) to support programmers develop parallel and distributed applications. Experimental results for realistic kernel codes taken from real-world applications are presented to demonstrate both accuracy and usefulness of P 3 T +.

“…VFC applies various code transformations and optimizations onto the program with/without user control. The programmer can invoke a performance analysis system (SCALA [7]) to instrument, compile, and execute a distributed or parallel program on the target architecture. Based on the instrumented program execution, performance data is gathered and stored in the program database.…”

Section: P T +: a Performance Estimator For Distributed And Parallel mentioning

confidence: 99%

“…7 show the predicted and measured values for the P 3 T + parameters: number of transfers, amount of data transferred, and transfer times. The experiments have been conducted for various number of processors and problem sizes on a NEC Cenju-4 machine.…”

mentioning

confidence: 98%

Evaluation of P/sup 3/T+: a performance estimator for distributed and parallel applications

Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000

Pozgaj²,

Luitz³

et al.

Self Cite

In this paper, we report on experiences with P 3 T +, a performance estimator for distributed and parallel programs which is used to examine at compile time the performance outcome of changes in code, problem and machine sizes, and target architectures. P 3 T + computes a variety of performance parameters including work distribution, number of transfers, amount of data transferred, transfer times, computation times, and number of cache misses. It is unique in that it models programs, code transformations, and parallel and distributed architectures and derives a performance prediction based on all three of these elements. P 3 T + is the successor tool of P 3 T which computed a similar set of performance parameters, however, for parallel programs only. P 3 T + has been re-designed and re-implemented from scratch and goes beyond P 3 T by extending the class of programs that can be handled and by employing several novel estimation methods (symbolic analysis, simulation, pre-measured kernel codes, etc.).The core part of this paper reports on the evaluation of P 3 T + to demonstrate both accuracy and usefulness of this tool for realistic kernel codes taken from real-world applications (pricing of financial derivatives and quantum mechanical calculations of solids).