Computing and spreading global information in large-scale distributed systems pose significant challenges when scalability, parallelism, resilience and consistency are demanded. Epidemic protocols are a robust and scalable computing and communication paradigm that can be effectively used for information dissemination and data aggregation in a fully decentralised context where each network node requires the local computation of a global synopsis function. Theoretical analysis of epidemic protocols for synchronous and static network models provide guarantees on the convergence to a global target and on the consistency among the network nodes. However, practical applications in real-world networks may require the explicit detection of both local convergence and global agreement (consensus). This work introduces the Epidemic Consensus Protocol (ECP) for the determination of consensus on the convergence of a decentralised data aggregation task. ECP adopts a heuristic method to locally detect convergence of the aggregation task and stochastic phase transitions to detect global agreement and reach consensus. The performance of ECP has been investigated by means of simulations and compared to a tree-based Three-Phase Commit protocol (3PC). Although, as expected, ECP exhibits total communication costs greater than the optimal tree-based protocol, it is shown to have better performance and scalability properties; ECP can achieve faster convergence to consensus for large system sizes and inherits the intrinsic decentralisation, fault-tolerance and robustness properties of epidemic protocols.
Consensus is one of the fundamental problems in multi-agent systems and distributed computing, in which agents or processing nodes are required to reach global agreement on some data value, decision, action, or synchronisation. In the absence of centralised coordination, achieving global consensus is challenging especially in dynamic and large-scale distributed systems with faulty processes. This paper presents a fully decentralised phase transition protocol to achieve global consensus on the convergence of an underlying information dissemination process. The proposed approach is based on Epidemic protocols, which are a randomised communication and computation paradigm and provide excellent scalability and fault-tolerant properties. The experimental analysis is based on simulations of a large-scale information dissemination process and the results show that global agreement can be achieved without deterministic and global communication patterns, such as those based on centralised coordination.
In large-scale distributed systems data aggregation is a fundamental task that provides a global synopsis over a distributed set of data values. Epidemic protocols are based on a randomised communication paradigm inspired by biological systems and have been proposed to provide decentralised, scalable and fault-tolerant solutions to the data aggregation problem. However, in epidemic aggregation, nodes failure and churn have a detrimental effect on the accuracy of the local estimates of the global aggregation target. In this paper, a novel approach, the Robust Epidemic Aggregation Protocol (REAP), is proposed to provide robustness in the presence of churn by detecting three distinct phases in the aggregation process. An analysis of the impact of each phase over the estimation accuracy is provided. In particular, a novel mechanism is introduced to improve the phase that is most critical for the protocol accuracy. REAP is validated by means of simulations and is shown to achieve convergence with a good level of accuracy for a reasonable range of node churn rates.
Software services based on large-scale distributed systems demand continuous and decentralised solutions for achieving system consistency and providing operational monitoring. Epidemic data aggregation algorithms provide decentralised, scalable and fault-tolerant solutions that can be used for system-wide tasks such as global state determination, monitoring and consensus. Existing continuous epidemic algorithms either periodically restart at fixed epochs or apply changes in the system state instantly producing less accurate approximation. This work introduces an innovative mechanism without fixed epochs that monitors the system state and restarts upon the detection of the system convergence or divergence. The mechanism makes correct aggregation with an approximation error as small as desired. The proposed solution is validated and analysed by means of simulations under static and dynamic network conditions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.