IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012) 2012
DOI: 10.1109/dsn.2012.6263953
|View full text |Cite
|
Sign up to set email alerts
|

Lightweight cooperative logging for fault replication in concurrent programs

Abstract: Abstract-This paper presents CoopREP, a system that provides support for fault replication of concurrent programs, based on cooperative recording and partial log combination. CoopREP employs partial recording to reduce the amount of information that a given program instance is required to store in order to support deterministic replay. This allows to substantially reduce the overhead imposed by the instrumentation of the code, but raises the problem of finding the combination of logs capable of replaying the f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2013
2013
2017
2017

Publication Types

Select...
4
3
1

Relationship

4
4

Authors

Journals

citations
Cited by 13 publications
(12 citation statements)
references
References 25 publications
0
12
0
Order By: Relevance
“…Failing schedules may manifest rarely and reproducing them is often difficult. Prior work has addressed reproducibility with a number of different strategies, including record and replay (R&R) (both order based (Huang et al [2010], Yang et al [2011], and Jiang et al [2014]) and search based (Machado et al [2012], Zhou et al [2012], and Huang et al [2013])) and deterministic execution (Olszewski et al [2009], Berger et al [2009], and Devietti et al [2009]). These techniques allow the developer to observe a failing execution multiple times, but simply reproducing a failure may provide no insight into its cause.…”
Section: Introductionmentioning
confidence: 99%
“…Failing schedules may manifest rarely and reproducing them is often difficult. Prior work has addressed reproducibility with a number of different strategies, including record and replay (R&R) (both order based (Huang et al [2010], Yang et al [2011], and Jiang et al [2014]) and search based (Machado et al [2012], Zhou et al [2012], and Huang et al [2013])) and deterministic execution (Olszewski et al [2009], Berger et al [2009], and Devietti et al [2009]). These techniques allow the developer to observe a failing execution multiple times, but simply reproducing a failure may provide no insight into its cause.…”
Section: Introductionmentioning
confidence: 99%
“…LBR/LCR [2], in turn, uses on low-overhead hardware extensions to maintain a short-term log of hardware events that are useful for production run failure diagnosis. CoopREP [32] records partial logs from multiple user instances running a multithreaded program and combines that information to deterministically replay a concurrency error. Aviso [29] uses statistical analysis of production-run event traces, but with the orthogonal goal of avoiding failures, rather than exposing them.…”
Section: Related Workmentioning
confidence: 99%
“…Note that, in order to ensure deterministic error replay, one should log all sources of non-determinism of the program, and not solely user input. On the other hand, dealing with other sources of non-determinism is out of the scope of the REAP system for the following two main reasons: i) different types of non-deterministic sources could be tackled using dedicated solutions aimed at supporting deterministic replay [23,24]; ii) from the privacy perspective, which represents the focus of our work, user inputs are arguably the most critical sources of non-determinism. Our prototype of REAP supports multi-threaded programs (using the Java Pathfinder extension jpf-concurrent [25]) but, at this time, does not handle the reproduction of concurrency bugs.…”
Section: Prototype Implementationmentioning
confidence: 99%