A b s t r a c t . Process migration is one technique to implement environments that perform automatic load balancing. However on networks of workstations the load indices and heuristics that are used must respect the load that is imposed on the system by other users' processes. In this paper we suggest an approach that uses an existing process migration component to construct an automatic load balancing system for MPI applications. Both the load indices and the heuristics consider load that is imposed on the system due to other users activity. For a computational fluid dynamics application performance improvements between 10% and 54~0 could be achieved.
Consistent Checkpointing with CoCheckThe CoCheck environment allows both the creation of checkpoints and the migration of processes of parallel applications on networks of workstations. Initially CoCheck extended PVM [3], so that PVM applications could be started under the control of a resource management system [9]. In that case, CoCheck was used to create checkpoints in order to provide global scheduling of parallel applications. Although process migration was already supported, its performance needed further improvement. Consequently, the focus of the research was set on performance improvements of checkpointing and particularly process migration. This could be achieved by transferring the checkpoints directly over T C P network connections [8]. As the next step, CoCheck was implemented to support the proposed MPI [5] message passing standard. Therefore, the protocol was integrated with tuMPI 1 which is an implementation of the MPI standard definition [10]. As could be shown in [10] migration times of a process are depended on the size of the migrated process. The time to migrate a single process is given in 1. X t(x) = 1.77s + 763kBytes/s (1)
On-line tools for parallel and distributed programs require a facility to observe and possibly manipulate the programs' run-time behavior, a so called monitoring system. Currently, most tools use proprietary monitoring techniques that are incompatible to each other and usually apply only to specific target platforms. The On-line Monitoring Interface Specification (OMIS) is the first specification of a universal interface between different tools and a monitoring system, thus enabling interoperable, portable and uniform tool environments. The paper gives an introduction into the basic concepts of OMIS and presents the design and implementation of an OMIS compliant monitoring system (OCM).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.