The literature on communication-induced checkpointing presents a family of protocols that use logical clocks to control whether forced checkpoints must be taken. For many years, HMNR, also called Fully Informed (FI), was the most complex and efficient protocol of this family. The Lazy-FI protocol applies a lazy strategy that defers the increase of logical clocks, resulting in a protocol with better perfomance for distributed systems where processes can take basic checkpoints at different, asymmetric, rates. Recently, the Fully Informed aNd Efficient (FINE) protocol was proposed using the same control structures as FI, but with a stronger and, presumably better, checkpoint-inducing condition. FINE and its lazy version, called Lazy-FINE, would now be the most efficient checkpointing protocols based on logical clocks. This paper reviews this family of protocols, proves a theorem on a condition that must be enforced by all stronger versions of FI, and proves that both FINE and Lazy-FINE do not guarantee the absence of useless checkpoints. As a consequence, FI and Lazy-FI can be rolled back to the position of most efficient protocols of this family of index-based checkpointing protocols.
A checkpointing protocol that enforces rollbackdependency trackability (RDT) during the progress of a distributed computation must induce processes to take forced checkpoints to avoid the formation of non-trackable rollback dependencies. A protocol based on the minimal characterization of RDT tests only the smallest set of non-trackable dependencies. The literature indicated that this approach would require the processes to maintain and propagate O(n 2 ) control information, where n is the number of processes in the computation. In this paper, we present a protocol that implements this approach using only O(n) control information.
Grid is an emerging infrastructure used to share resources among virtual organizations in a seamless manner and to provide breakthrough computing power at low cost. Nowadays there are dozens of academic and commercial products that allow execution of isolated tasks on grids, but few products support the enactment of long-running processes in a distributed fashion. In order to address such subject, this paper presents a programming model and an infrastructure that hierarchically schedules process activities using available nodes in a wide grid environment. Their advantages are automatic and structured distribution of activities and easy process monitoring and steering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.