[1989] the Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers
DOI: 10.1109/ftcs.1989.105592
|View full text |Cite
|
Sign up to set email alerts
|

Understanding large system failures-a fault injection experiment

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
31
0

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 112 publications
(31 citation statements)
references
References 6 publications
0
31
0
Order By: Relevance
“…Failure acceleration occurs when the fault → error → failure process is accelerated, by decreasing the fault and error latencies, and increasing the probability that a fault causes a failure [Chillarege and Bowen, 1989]. This makes experiments faster to perform and allows for estimations of the transition probabilities (fault → error and error → failure), which is typically not possible from field data (which focus mostly on failures).…”
Section: Fault Injectionmentioning
confidence: 99%
See 3 more Smart Citations
“…Failure acceleration occurs when the fault → error → failure process is accelerated, by decreasing the fault and error latencies, and increasing the probability that a fault causes a failure [Chillarege and Bowen, 1989]. This makes experiments faster to perform and allows for estimations of the transition probabilities (fault → error and error → failure), which is typically not possible from field data (which focus mostly on failures).…”
Section: Fault Injectionmentioning
confidence: 99%
“…Logs are sent and stored on a different machine (the Host Computer) and a minimal set of configuration information is stored in flash memory. This process is conservative and common for fault injection experiments, e.g., [Chillarege and Bowen, 1989;Gu et al, 2003]. However, the restart before for each injection incurs a substantial run-time overhead when errors are masked or overwritten.…”
Section: Injection Setupmentioning
confidence: 99%
See 2 more Smart Citations
“…In [Mitr88], this technique was extended to a shared-memory multiprocessing system and used to calculate the risk of encountering multiple latent errors. A failure acceleration method for determining fault detection characteristics is discussed in [Chil89]. Because this study used periodic sampling, the discovery times of only per manent faults could be measured.…”
Section: Latency Studiesmentioning
confidence: 99%