The trend towards many-core multi-processor systems and clusters will make systems with tens and hundreds of processors more widely available. Current manual debugging techniques do not scale well to such large systems. Advanced automated debugging tools are needed for standard programming models based on commodity computing, such as threads and MPI. We surveyed MPI users to identify the kinds of MPI errors that they encounter, and classify the errors into several types. We describe how automated tools can detect such errors and present the Intel® Message Checker (IMC) technology being developed at the Intel Advanced Computing Center.IMC's unique technology automatically detects several kinds of MPI errors such as various types of mismatches, race conditions, deadlocks and potential deadlocks, and resource misuse. Finally, we review the usability and uniqueness of IMC and discuss our future plans.
OpenMP is a relatively new programming paradigm, which can easily deliver good parallel performance for small numbers (<16) of processors. Success with more processors is more difficult to produce. MPI is a relatively mature programming paradigm, and there have been many reports of highly-scalable MPI codes for large numbers (hundreds, even thousands) of processors. In this paper, we explore the causes of poor scalability with OpenMP from two points of view. First, we incrementally transform the loops in a combustion application until we achieve reasonably good parallel scalability, and chronicle the effect of each step. Then, we approach scalability from the other direction by transforming a highly scalable program simulating the core flow of a solid fuel rocket engine (originally written with MPI calls), directly to OpenMP, and report the barriers to scalability that were detected. The list of incremental transformations includes well-known techniques such as loop interchange and loop fusion, plus new ones which make use of the unique features of OpenMP, such as barrier removal and the use of ordered serial loops. The list of barriers to scalability includes the use of the ALLOCATE statement within a parallel region, as well as the lack of a reduction clause for a PARALLEL region in OpenMP. We conclude with a list of key issues which need to be addressed to make OpenMP a more easily scalable paradigm. Some of these are OpenMP implementation issues; some are language issues.
This is a preprint of a paper intended for publication in a journal or proceedings. Since changes may be made before publication, this preprint is made available with the understanding that it will not be cited or reproduced without the permission of the author.
arallel application developers today face the problem of how to integrate the dominant parallel processing models into one source code. Most high-performance systems use the Distributed Memory Parallel (DMP) and Shared Memory Parallel (SMP; also known as Symmetric MultiProcessor) models, and many applications can benefit from support for multiple parallelism modes. Here we show how to integrate both modes into high-performance parallel applications. These applications have three primary goals:
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.