2016
DOI: 10.1109/ms.2016.60
|View full text |Cite
|
Sign up to set email alerts
|

Chaos Engineering

Abstract: Modern software-based services are implemented as distributed systems with complex behavior and failure modes. Many large tech organizations are using experimentation to verify the reliability of such systems. We use the term "Chaos Engineering" to refer to this approach, and discuss the underlying principles and how to use it to run experiments

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
119
0
1

Year Published

2016
2016
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 180 publications
(120 citation statements)
references
References 1 publication
0
119
0
1
Order By: Relevance
“…Basiri et al's seminal paper [6] about chaos engineering tells us that the main goals of chaos engineering are: 1) to verify error handling capabilities and resilience in production settings; 2) to learn about error handling behavior in production. The mantra of chaos engineering is to experiment with the system in production.…”
Section: A Brief Overviewmentioning
confidence: 99%
See 1 more Smart Citation
“…Basiri et al's seminal paper [6] about chaos engineering tells us that the main goals of chaos engineering are: 1) to verify error handling capabilities and resilience in production settings; 2) to learn about error handling behavior in production. The mantra of chaos engineering is to experiment with the system in production.…”
Section: A Brief Overviewmentioning
confidence: 99%
“…Chaos engineering is a new field that consists in injecting faults in production systems to assess the resilience of a software system [6]. The core idea of chaos engineering is active probing: the chaos engineering system actively injects a controlled perturbation into the production system and observes the impact of the perturbation as well as the reaction of the system under study [2], [10], [17].…”
Section: Introductionmentioning
confidence: 99%
“…In reaction, practitioners increasingly rely on resiliency techniques based on testing and fault injection 6,14,19,23,27,35 . These "black box" approaches (which perturb and observe the complete system, rather than its components) are (arguably) better suited for testing an end-to-end property such as fault tolerance.…”
Section: The Vanguardmentioning
confidence: 99%
“…24 Chaos Engineering, the practice of actively perturbing production systems to increase overall site resiliency, was pioneered by Netflix, 6 but since then Linkedin, 52 Microsoft, 38 Uber, 47 and PagerDuty 5 have developed Chaos-based infrastructures. Jepsen performs black box testing and fault injection on unmodified distributed data management systems, in search of correctness violations (e.g., counterexamples that show an execution was not linearizable).…”
Section: The Vanguardmentioning
confidence: 99%
“…At Netflix, we practice Chaos Engineering [3]. Namely, we believe there is a level of complexity in modern distributed systems that is chaotic, and that a chief architect cannot hold all of the system's moving parts in their head.…”
Section: Introductionmentioning
confidence: 99%