Upright cluster services

Clement, Allen; Kapritsos, Manos; Lee, Sang‐Min; Wang, Yang; Alvisi, Lorenzo; Dahlin, Mike; Riché, Taylor L.

doi:10.1145/1629575.1629602

Cited by 156 publications

(165 citation statements)

References 30 publications

Supporting

Mentioning

163

Contrasting

Unclassified

Order By: Relevance

“…Byzantine fault tolerance (BFT) (Castro and Liskov, 2002;Clement et al, 2009a) is a promising technology that could help an application achieve high availability and trustworthiness. A Byzantine fault (Lamport et al, 1982) refers to an arbitrary fault, which could be a crash or malicious fault.…”

Section: Request-2mentioning

confidence: 99%

See 1 more Smart Citation

Byzantine fault tolerance for session-oriented multi-tiered applications

Chai¹,

Zhao²

2013

IJWS

View full text Add to dashboard Cite

This article presents a lightweight Byzantine fault tolerance (BFT) framework for session-oriented multi-tiered applications. We conclude that it is sufficient to use a lightweight BFT algorithm instead of a traditional BFT algorithm, based on a comprehensive study of the threat model to, and the state model of, the session-oriented multi-tiered applications. The lightweight BFT algorithm uses source ordering, rather than total ordering, of incoming requests to achieve Byzantine fault tolerant state-machine replication of such type of applications. The performance of the lightweight BFT framework is evaluated using a shopping cart application prototype built on the web services platform. The same shopping cart application is used as a running example to illustrate the problem we address and our proposed solution. Performance evaluation results obtained from the prototype implementation show that indeed our lightweight BFT algorithm incurs very insignificant overhead.

show abstract

Section: Request-2mentioning

confidence: 99%

“…There are a large body of work on modern BFT algorithms, such as (Castro and Liskov, 2002;Clement et al, 2009aClement et al, , 2009bSingh et al, 2009). These algorithms are designed to protect generic stateful servers against Byzantine faults in a client-server environment.…”

Section: Related Workmentioning

confidence: 99%

Byzantine fault tolerance for session-oriented multi-tiered applications

Chai¹,

Zhao²

2013

IJWS

View full text Add to dashboard Cite

show abstract

“…Starting from PBFT [12], various proposals [27,28,14,29,15] aim to reduce latency and increase throughput. Aardvark [30] and Zyzzyvark [31] focus on sustainable performance rather than peak performance. Other proposals focus on reducing the number of full replicas [13,32,33].…”

Section: Comparison With Prior Workmentioning

confidence: 99%

Byzantine Chain Replication

Renesse

Schiper

2012

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. We present a new class of Byzantine-tolerant State Machine Replication protocols for asynchronous environments that we term Byzantine Chain Replication. We demonstrate two implementations that present different trade-offs between performance and security, and compare these with related work. Leveraging an external reconfiguration service, these protocols are not based on Byzantine consensus, do not require majoritybased quorums during normal operation, and the set of replicas is easy to reconfigure. One of the implementations is instantiated with t + 1 replicas to tolerate t failures and is useful in situations where perimeter security makes malicious attacks unlikely. Applied to in-memory BerkeleyDB replication, it supports 20,000 transactions per second while a fully Byzantine implementation supports 12,000 transactions per second-about 70% of the throughput of a non-replicated database.

show abstract

“…An important project related to Hadoop's omission failures is presented in [37]. In this work, authors have tried to build separate fault tolerance thresholds in the UpRight library for omission and commission failures, because omission failures are likely to be more common than commission failures.…”

Section: Resource Aware Speculative Scheduling (Ras)mentioning

confidence: 99%

“…The work discussing the omission failures in [37], is actually a wider review that includes the byzantine failures in general. The main properties upon which the UpRight library is based are:…”

Section: Arbitrary (Byzantine) Failurementioning

confidence: 99%

Optimizing the reliability and resource efficiency of MapReduce-based systems

Memishi¹

View full text Add to dashboard Cite

Due to the increase of huge data volumes, a new parallel computing paradigm to process big data in an efficient way has arisen. Many of these systems, called dataintensive computing systems, follow the Google MapReduce programming model. The main advantage of these systems is based on the idea of sending the computation where the data resides, trying to provide scalability and efficiency.In failure-free scenarios, these frameworks usually achieve good results. However, these ones are not realistic scenarios. Consequently, these frameworks exhibit some fault tolerance and dependability techniques as built-in features. On the other hand, dependability improvements are known to imply additional resource costs. This is reasonable and providers offering these infrastructures are aware of this. Nevertheless, not all the approaches provide the same tradeoff between fault tolerant capabilities (or more generally, reliability capabilities) and cost.In this thesis, we have addressed the coexistence between reliability and resource efficiency in MapReduce-based systems, looking for methodologies that introduce the minimal cost and guarantee an appropriate level of reliability. In order to achieve this, we have proposed: (i) a formalization of a failure detector abstraction; (ii) an alternative solution to single points of failure of these frameworks, and finally (iii) a novel feedback-based resource allocation system at the container level.Finally, our generic contributions have been instantiated for the Hadoop YARN architecture, which is the state-of-the-art framework in the data-intensive computing systems community nowadays. The thesis demonstrates how all our approaches outperform Hadoop YARN in terms of reliability and resource efficiency. 3 ResumenDebido al gran incremento de datos digitales que ha tenido lugar en los últimos años, ha surgido un nuevo paradigma de computación paralela para el procesamiento eficiente de grandes volúmenes de datos. Muchos de los sistemas basados en este paradigma, también llamados sistemas de computación intensiva de datos, siguen el modelo de programación de Google MapReduce. La principal ventaja de los sistemas MapReduce es que se basan en la idea de enviar la computación donde residen los datos, tratando de proporcionar escalabilidad y eficiencia.En escenarios libres de fallo, estos sistemas generalmente logran buenos resultados. Sin embargo, la mayoría de escenarios donde se utilizan, se caracterizan por la existencia de fallos. Por tanto, estas plataformas suelen incorporar características de tolerancia a fallos y fiabilidad. Por otro lado, es reconocido que las mejoras en confiabilidad vienen asociadas a costes adicionales en recursos. Esto es razonable y los proveedores que ofrecen este tipo de infraestructuras son conscientes de ello. No obstante, no todos los enfoques proporcionan la misma solución de compromiso entre las capacidades de tolerancia a fallo (o de manera general, las capacidades de fiabilidad) y su coste.Esta tesis ha tratado la problemática de la coexist...

show abstract

Upright cluster services

Cited by 156 publications

References 30 publications

Byzantine fault tolerance for session-oriented multi-tiered applications

Byzantine fault tolerance for session-oriented multi-tiered applications

Byzantine Chain Replication

Optimizing the reliability and resource efficiency of MapReduce-based systems

Contact Info

Product

Resources

About