Abstract. We give a formal specification and an implementation for a partitionable group communication service in asynchronous distributed systems. Our specification is motivated by the requirements for building "partition-aware" applications that can continue operating without blocking in multiple concurrent partitions and reconfigure themselves dynamically when partitions merge. The specified service guarantees liveness and excludes trivial solutions; it constitutes a useful basis for building realistic partition-aware applications; and it is implementable in practical asynchronous distributed systems where certain stability conditions hold.
Distributed s y s t e m s t h a t s p a n large geographic distances o r m a n a g e large n u m b e r s of objects are already c o m m o n place. In s u c h s y s t e m s , programming applicationa with e v e n m o d e s t reliability requirements t o run correctly and efficiently i s a difficult task due t o asynchrony and t h e possibility of complez failure scenarios. In this paper, w e describe t h e architecture of t h e RELACS c o m m u n i c a t i o n s u b s y s t e m that constitutes t h e microkernel of a layered approach t o reliable computing in large-scale distributed s y s t e m s . RELACS is designed t o be highly portable and i m p l e m e n t s a v e r y s m a l l numberof abstractions and primitives that should be sufficient f o r building a variety of interesting higher-level paradigms.
Transient failures, unknown scheduling strategies and variable loads on the computing and communication resources give rise to an
asynchronous
and
partitionable
characterization for practical distributed systems with large geographic extent. We consider the group membership problem in partitionable asynchronous systems and give a formal specification that guarantees liveness and prevents capricious view splitting. Our work is based on the notion of
reachability
as an appropriate characterization of failures in partitionable systems in that it subsumes both process crashes and communication failures. The group membership problem is formulated in the context of
view synchrony
that includes a reliable multicast service for communication within the group. Our specification is modular and includes properties governing group membership separately from those governing reliable multicasts. It can be taken either partially for defining a group membership service alone, or taken as a whole for defining view synchrony.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.