2012
DOI: 10.1145/2346916.2353017
|View full text |Cite
|
Sign up to set email alerts
|

Fault Injection in Production

Abstract: When we build Web infrastructures at Etsy, we aim to make them resilient. This means designing them carefully so that they can sustain their (increasingly critical) operations in the face of failure.Thankfully, there have been a couple of decades and reams of paper spent on researching how fault tolerance and graceful degradation can be brought to computer systems. That helps the cause.To make sure that the resilience built into Etsy systems is sound and that the systems behave as expected, we have to see the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0
1

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 12 publications
0
6
0
1
Order By: Relevance
“…Embedded learning practices include attending mindfully to the incident in its local setting, developing emergent strategies to address unpredictable situations, and gaining firsthand experience which, over time, expands the organization's repertoire of practices for handling future incidents. An interesting learning practice in this mode is suggested by Allspaw (2012), who advocated that organizations intentionally inject faults into their systems to engage their members in handling incidents, which can enhance their practical capabilities. Box 3 further illustrates this learning mode.…”
Section: Acquiring Experience Through Practical Engagement With Incid...mentioning
confidence: 99%
“…Embedded learning practices include attending mindfully to the incident in its local setting, developing emergent strategies to address unpredictable situations, and gaining firsthand experience which, over time, expands the organization's repertoire of practices for handling future incidents. An interesting learning practice in this mode is suggested by Allspaw (2012), who advocated that organizations intentionally inject faults into their systems to engage their members in handling incidents, which can enhance their practical capabilities. Box 3 further illustrates this learning mode.…”
Section: Acquiring Experience Through Practical Engagement With Incid...mentioning
confidence: 99%
“…They call this practice "chaos engineering". When fault injection is done in production on a special day under full control (as opposed to automatically at any arbitrary point in time), it is called a GameDay exercise [1].…”
Section: Failure Injection In Productionmentioning
confidence: 99%
“…Handling life cycle dynamics will require an architecture equipped with the capacity to adjust performance over a wide dynamic range (Doyle & Csete, 2011). This is, in part, the target of extensible critical digital services (Allspaw, 2012) and closely related to resilience engineering (Woods, 2015). Closing the gap between the demonstration and the real thing requires the development of new ways to design systems to be manageable and extensible over life cycles.…”
Section: Life Cyclementioning
confidence: 99%