Proceedings of the Fifteenth European Conference on Computer Systems 2020
DOI: 10.1145/3342195.3387543
|View full text |Cite
|
Sign up to set email alerts
|

State-machine replication for planet-scale systems

Abstract: Online applications now routinely replicate their data at multiple sites around the world. In this paper we present Atlas, the first state-machine replication protocol tailored for such planet-scale systems. Atlas does not rely on a distinguished leader, so clients enjoy the same quality of service independently of their geographical locations. Furthermore, clientperceived latency improves as we add sites closer to clients. To achieve this, Atlas minimizes the size of its quorums using an observation that conc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
52
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 31 publications
(52 citation statements)
references
References 22 publications
0
52
0
Order By: Relevance
“…The protocol saturates at around 4K clients per site, when the outgoing network bandwidth at the leader reaches 95% usage. The fact that the leader can be a bottleneck in leader-based protocol has been reported by several prior works [13,23,24,31,43].…”
Section: Full Replication Deploymentmentioning
confidence: 98%
See 1 more Smart Citation
“…The protocol saturates at around 4K clients per site, when the outgoing network bandwidth at the leader reaches 95% usage. The fact that the leader can be a bottleneck in leader-based protocol has been reported by several prior works [13,23,24,31,43].…”
Section: Full Replication Deploymentmentioning
confidence: 98%
“…Unfortunately, all existing leaderless SMR protocols suffer from drawbacks in the way they order commands. Some protocols [1,5,13,31] maintain explicit dependencies between commands: a replica may execute a command only after all its dependencies get executed. These dependencies may form arbitrary long chains.…”
Section: Introductionmentioning
confidence: 99%
“…Given the pervasiveness of replicated state machines in distributed systems and applications, improving the replication throughput has been an important problem in the past decade. Some solutions [9,23] call for boosting throughput and reducing latency at the same time, while others [7,35] argue for trading off some latency in exchange for better throughput. Both camps, however, take a similar high-level approach for improving their performance.…”
Section: Current Approaches To Scaling State Machine Replicationmentioning
confidence: 99%
“…Usually, the bottleneck is at the leader, as it is used to communicate with the clients and coordinate the replication [1]. Systems like EPaxos [23,31] and Atlas [9] avoid having a one-node bottleneck by not having a single leader that centers all communication around it. Instead, these systems expand on Fast Paxos [17] ideas and try to use fast quorums to commit/replicate operations in one round-trip network latency from any node in the system.…”
Section: Current Approaches To Scaling State Machine Replicationmentioning
confidence: 99%
See 1 more Smart Citation