Agreementon the membership of a group of processes in a distributed system is a basic problem that arises in a wide range of applications. Such groups occur when a set of processes co-operate to perform some task, share memory, monitor one another, subdivide a computation, and so forth. In this paper we discuss the Group Membership Problem as it relates to failure detection in asynchronous, distributed systems. We present a rigorous, formal specification for group membership under this interpretation. We then present a solution for this problem that improves upon previous work.
Failure detectors (or, more accurately Failure Suspectors -FS) appear to be a fundamental service upon whach to buald fault-tolerant, distributed applacataons. This paper shows that a FS with very weak semantics (i.e. that delivers failure and recovery information in no specific order) suffices to implement virtually-synchronous communication (VSC) in an asynchronous system subject to process crash failures and network partitions. The VSC paradigm is particularly useful in asynchronous systems and greatly simplifies building fault-tolerant applications that mask failures b y replicating processes. We suggest a three-component architecture to implement virtuallysynchronous communication : 1) at the lowest level, the FS component; on top of at, 2a) a component that defines new views, and 2b) a component that reliably multicasts messages within a view.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.