Extended AbstractWith the currenr trend h padkl cxmpckr xchitpctllrps towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the inftuence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming parad i m is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possibie wirh &e Luiierit iiiip!c;..eatxici. . 4 drt2iled performance analysis is crucial to clarify these issues.The multilevel programming paradi=ps considered in this study are hybrid MPVOpenMP, MLP, and nested Openh4P. The hybrid MPYOpenME' approach is based on using MPI [7] for the coarse grained parallelization and OpenMP 191 for fine grained loop level parallelism. The MPI programming paradim assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradi,p under consideration is MLP which was developed by Taft [ 1 11. The approach is similar to MPUOpenMP, using a mix of coarse grain process level paralleliiation and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory. Libraries supporting the MLP paradigm usually provide routines for process creation, shared memory allocation, and