Fault injection in production

Allspaw, John

doi:10.1145/2347736.2347751

Cited by 14 publications

(6 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They call this practice "chaos engineering". When fault injection is done in production on a special day under full control (as opposed to automatically at any arbitrary point in time), it is called a GameDay exercise [1].…”

Section: Failure Injection In Productionmentioning

confidence: 99%

Principles of Antifragile Software

Monperrus

2017

Companion to the First International Conference on the Art, Science and Engineering of Programming

View full text Add to dashboard Cite

There are many software engineering concepts and techniques related to software errors. But is this enough? Have we already completely explored the software engineering noosphere with respect to errors and reliability? In this paper, I discuss an novel concept, called "software antifragility", that is unconventional and has the capacity to improve the way we engineer errors and dependability in a disruptive manner. This paper first discusses the foundations of software antifragilty, from classical fault tolerance to the most recent advances on automatic software repair and fault injection in production. This paper then explores the relation between the antifragility of the development process and the antifragility of the resulting software product.

show abstract

Section: Failure Injection In Productionmentioning

confidence: 99%

Principles of Antifragile Software

Monperrus

2017

Companion to the First International Conference on the Art, Science and Engineering of Programming

View full text Add to dashboard Cite

show abstract

“…There has been a paradigm shift --from trying to avoid failures at all costs to embracing faults as opportunities for making the system more resilient. The rationale behind fault injection testing of deployed software can be summarized as follows [4]:…”

Section: Relationship To Cloud-based Solutionsmentioning

confidence: 99%

“…Their focus mainly lies on hardware fault models, namely network partitions and latency, as well as node crashes. There are also "fault model agnostic", configurable solutions such as [7] [8] [4]. These solutions provide a framework for easily injecting faults, but the fault classes themselves have to be implemented to some extent by the user.…”

mentioning

confidence: 99%

Experimental Assessment of Cloud Software Dependability Using Fault Injection

Herscheid

Richter

Polze

2015

IFIP Advances in Information and Communication Technology

View full text Add to dashboard Cite

Part 5: Distributed ComputingInternational audienceIn modern cloud software systems, the complexity arising from feature interaction, geographical distribution, security and configurability requirements increases the likelihood of faults. Additional influencing factors are the impact of different execution environments as well as human operation or configuration errors. Assuming that any non-trivial cloud software system contains faults, robustness testing is needed to ensure that such faults are discovered as early as possible, and that the overall service is resilient and fault tolerant. To this end, fault injection is a means for disrupting the software in ways that uncover bugs and test the fault tolerance mechanisms. In this paper, we discuss how to experimentally assess software dependability in two steps. First, a model of the software is constructed from different runtime observations and configuration information. Second, this model is used to orchestrate fault injection experiments with the running software system in order to quantify dependability attributes such as service availability. We propose the architecture of a fault injection service within the OpenStack project

show abstract

“…Fault injection is a significant solution to emulate these problems in a controlled way, to make distributed systems more fault-tolerant. For example, several large companies, such as Netflix, Uber, Amazon, have been using fault injection for their chaos engineering and game day exercises to assess the reliability of their services [13,14]. Unfortunately, fault injection has a high entry barrier, and it is still beyond the reach of the minor service providers, due to the cost and complexity of planning and orchestrate fault injection experiments.…”

Section: Introductionmentioning

confidence: 99%

ThorFI: A Novel Approach for Network Fault Injection as a Service

Cotroneo¹,

Simone²,

Natella³

2022

Preprint

View full text Add to dashboard Cite

In this work, we present a novel fault injection solution (ThorFI ) for virtual networks in cloud computing infrastructures. ThorFI is designed to provide non-intrusive fault injection capabilities for a cloud tenant, and to isolate injections from interfering with other tenants on the infrastructure. We present the solution in the context of the OpenStack cloud management platform, and release this implementation as open-source software. Finally, we present two relevant case studies of ThorFI, respectively in an NFV IMS and of a highavailability cloud application. The case studies show that ThorFI can enhance functional tests with fault injection, as in 4%-34% of the test cases the IMS is unable to handle faults; and that despite redundancy in virtual networks, faults in one virtual network segment can propagate to other segments, and can affect the throughput and response time of the cloud application as a whole, by about 3 times in the worst case.

show abstract

Fault injection in production

Abstract: Making the case for resilience testing.

Cited by 14 publications

References 0 publications

Principles of Antifragile Software

Principles of Antifragile Software

Experimental Assessment of Cloud Software Dependability Using Fault Injection

ThorFI: A Novel Approach for Network Fault Injection as a Service

Contact Info

Product

Resources

About