SUMMARYSeveral large real-world applications have been developed for distributed and parallel architectures. We examine two different program development approaches. First, the usage of a high-level programming paradigm which reduces the time to create a parallel program dramatically but sometimes at the cost of a reduced performance; a source-to-source compiler, has been employed to automatically compile programs-written in a high-level programming paradigm-into message passing codes. Second, a manual program development by using a low-level programming paradigm-such as message passing-enables the programmer to fully exploit a given architecture at the cost of a time-consuming and error-prone effort. Performance tools play a central role in supporting the performance-oriented development of applications for distributed and parallel architectures. SCALA-a portable instrumentation, measurement, and post-execution performance analysis system for distributed and parallel programs-has been used to analyze and to guide the application development, by selectively instrumenting and measuring the code versions, by comparing performance information of several program executions, by computing a variety of important performance metrics, by detecting performance bottlenecks, and by relating performance information back to the input program. We show several experiments of SCALA when applied to realworld applications. These experiments are conducted for a NEC Cenju-4 distributed-memory machine and a cluster of heterogeneous workstations and networks.