Lior Amar scite author profile

Shiloh

2005

MOSIX is a cluster management system that uses process migration to allow a Linux cluster to perform like a parallel computer. Recently it has been extended with new features that could make a grid of Linux clusters run as a cooperative system of federated clusters. On one hand, it supports automatic workload distribution among connected clusters that belong to different owners, while still preserving the autonomy of each owner to disconnect its cluster from the grid at any time, without sacrificing migrated processes from other clusters. Other new features of MOSIX include grid-wide automatic resource discovery; a precedence scheme for local processes and among guest processes (from other clusters); flood control; a secure run-time environment (sandbox) which prevents guest processes from accessing local resources in a hosting system, and support of cluster partitions. The resulting grid management system is suitable to create an intra-organizational highperformance computational grid, e.g., in an enterprise or in a campus. The paper presents enhanced and new features of MOSIX and their performance.

show abstract

An On-line Algorithm for Fair-Share Node Allocations in a Cluster

Levy

et al. 2007

Randomized gossip algorithms for maintaining a distributed bulletin board with guaranteed age properties

Concurrency and Computation

Drezner

et al. 2009

SUMMARYScalable computer systems, including clusters and multi-cluster grids, require routine exchange of information about the state of system-wide resources among their nodes. Gossip-based algorithms are popular for providing such information services due to their simplicity, fault tolerance and low communication overhead. This paper presents a randomized gossip algorithm for maintaining a distributed bulletin board among the nodes of a scalable computer system. In this algorithm each node routinely disseminates its most recently acquired information while maintaining a snapshot of the other nodes' states. The paper provides analytical approximations for the expected average age, the age distribution and the expected maximal age for the acquired information at each node. We confirm our results by measurements of the performance of the algorithm on a multi-cluster campus grid with 256 nodes and by simulations of configurations with up to 2048 nodes. The paper then presents practical enhancements of the algorithm, which makes it more suitable for a real system. Such enhancements include using fixed-size messages, reducing the number of messages sent to inactive nodes and supporting urgent information. The enhanced algorithm guarantees the age properties of the information at each node in the configurations with an arbitrary number of inactive nodes. It is being used in our campus grid for resource discovery, for dynamic assignment of processes to the best available nodes, for load-balancing and for on-line monitoring.

show abstract

Combining Virtual Machine migration with process migration for HPC on multi-clusters and Grids

Maoz

2008

The MOSIX Direct File System Access Method for Supporting Scalable Cluster File Systems

2004