The Message Passing Interface MPI can be used as a portable, high-performance programming model for wide-area computing systems. The wide-area environment i n troduces challenging problems for the MPI implementor, due to the heterogeneity of both the underlying physical infrastructure and the software environment at di erent sites. In this article, we describe an MPI implementation that incorporates solutions to these problems. This implementation has been constructed by extending the Argonne MPICH implementation of MPI to use communication services provided by the Nexus communication library and authentication, resource allocation, process creation management, and information services provided by the I-Soft system initially and the Globus metacomputing toolkit work in progress. Nexus provides multimethod communication mechanisms that allow m ultiple communication methods to be used in a single computation with a uniform interface; I-Soft and Globus provided standard authentication, resource management, and process management mechanisms. We describe how these various mechanisms are supported in the Nexus implementation of MPI and present performance results for this implementation on multicomputers and networked systems. We also discuss how more advanced services provided by the Globus metacomputing toolkit are being used to construct a second-generation wide-area MPI.
Metacomputing systems use high-speed networks to connect supercomputers, mass storage systems, scientific instruments, and display devices with the objective of enabling parallel applications to access geographically distributed computing resources. However, experience shows that high performance often can be achieved only if applications carn integrate diverse communication substrates, transport mechanisms, and protocols, chosen according to where communication is directed, what is communicated, or when communication is performed. In this article, we describe a software architecture that addresses this requirement. This architecture allows multiple communication methods to be supported transparently in a single application, with either automatic or user-specified selection criteria guiding the methods used for each communication. We describe an implementation of this architecture, based on the Nexus communication library, and use this implementation to evaluate performance issues. The implementation supported a wide variety of applications in the I-WAY metacomputing experiment at Supercomputing 95; we use one of these applications to provide a quantitative demonstration of the advantages of multimethod communication in a heterogeneous networked environment.
I IntroductionFuture networked computing systems will be increasingly heterogeneous in terms of both the types of networked devices and the capabilities of the networks used to connect these devices. At the same time, the applications that run on these networks are becoming more sophisticated in terms of the computations they perform and the types of data that they communicate [6].
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.