Distributed aggregation queries like average and sum can be implemented in different paradigms like gossip and hierarchical approaches. In the literature, these two paradigms are routinely associated with stereotypes such as "trees are fragile and complicated" and "gossip is slow and expensive". However, a closer look reveals that these statements are not backed up by systematic studies. A fair and informative comparison is clearly needed. However, this is a hard task because the performance of protocols from the two paradigms depends on different subtleties of the environment and the implementation of the protocols. We tackle this problem by carefully designing the comparison study. We use state-of-the-art algorithms and propose the problem of monitoring the network size in the presence of churn as the ideal problem for comparing very different paradigms for global aggregation. Our simulation study helps us identify the most important factors that differentiate between gossip and spanning tree aggregation: the time needed to compute a truly global output, the properties of the underlying topology, and sensitivity to dynamism. We demonstrate the effect of these factors in different practical topologies and scenarios. Our results help us to choose the right protocol in the light of the topology and dynamism patterns.
IntroductionFully distributed aggregation is an important problem where we wish to execute queries such as sum, average, minimum, or maximum over unreliable networks (sensor networks, physical networks of routers, overlay networks, etc.), in which no central servers are directly accessible.At least two paradigms are known that solve this problem. The first one is the gossip approach where algorithms were proposed to achieve large degrees of robustness. Gossip protocols do not rely on fixed topologies: nodes exchange information with random neighbors to implement a diffusion-like computation pattern, and as a result the system converges to a state where all the nodes know the query result. From the literature, here we just focus on the adaptive approaches. In [1], the authors propose the restarting technique to convert any one-shot algorithm into an adaptive one. Apart from restarting, other approaches have been proposed that focus on error correction through some form of bookkeeping at the nodes [2][3][4][5].The second paradigm is hierarchical aggregation, which is a popular method in sensor networks [6]. It was also proposed for general process groups [7]. Tree-based aggregation remained unpopular in some areas like peer-to-peer networks due to the widely held assumptions about its lack of robustness. There are a few notable exceptions however: the Astrolabe framework [8], which is in fact only a virtual tree with completely unstructured gossip communication patterns behind it; the GAP protocol and its variants [5, 9-11] that actually build a spanning tree over a distributed network; and PRISM [12], a hierarchical approach that is built on top of a distributed hashtable, with a focus on dete...