Lightweight cooperative logging for fault replication in concurrent programs

Machado, Nuno; Romano, Paolo; Rodrigues, L.

doi:10.1109/dsn.2012.6263953

Cited by 13 publications

(12 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Failing schedules may manifest rarely and reproducing them is often difficult. Prior work has addressed reproducibility with a number of different strategies, including record and replay (R&R) (both order based (Huang et al [2010], Yang et al [2011], and Jiang et al [2014]) and search based (Machado et al [2012], Zhou et al [2012], and Huang et al [2013])) and deterministic execution (Olszewski et al [2009], Berger et al [2009], and Devietti et al [2009]). These techniques allow the developer to observe a failing execution multiple times, but simply reproducing a failure may provide no insight into its cause.…”

Section: Introductionmentioning

confidence: 99%

Concurrency Debugging with Differential Schedule Projections

Machado

Quinta

Lucia

et al. 2016

ACM Trans. Softw. Eng. Methodol.

Self Cite

View full text Add to dashboard Cite

We present Symbiosis: a concurrency debugging technique based on novel differential schedule projections (DSPs). A DSP shows the small set of memory operations and dataflows responsible for a failure, as well as a reordering of those elements that avoids the failure. To build a DSP, Symbiosis first generates a full, failing, multithreaded schedule via thread path profiling and symbolic constraint solving. Symbiosis selectively reorders events in the failing schedule to produce a nonfailing, alternate schedule. A DSP reports the ordering and dataflow differences between the failing and nonfailing schedules. Our evaluation on buggy real-world software and benchmarks shows that, in practical time, Symbiosis generates DSPs that both isolate the small fraction of event orders and dataflows responsible for the failure and report which event reorderings prevent failing. In our experiments, DSPs contain 90% fewer events and 96% fewer dataflows than the full failure-inducing schedules. We also conducted a user study that shows that, by allowing developers to focus on only a few events, DSPs reduce the amount of time required to understand the bug's root cause and find a valid fix. CCS Concepts: r Software and its engineering → Software testing and debugging;

show abstract

Section: Introductionmentioning

confidence: 99%

Concurrency Debugging with Differential Schedule Projections

Machado

Quinta

Lucia

et al. 2016

ACM Trans. Softw. Eng. Methodol.

Self Cite

View full text Add to dashboard Cite

show abstract

“…LBR/LCR [2], in turn, uses on low-overhead hardware extensions to maintain a short-term log of hardware events that are useful for production run failure diagnosis. CoopREP [32] records partial logs from multiple user instances running a multithreaded program and combines that information to deterministically replay a concurrency error. Aviso [29] uses statistical analysis of production-run event traces, but with the orthogonal goal of avoiding failures, rather than exposing them.…”

Section: Related Workmentioning

confidence: 99%

Production-guided concurrency debugging

Machado

Lucia

Rodrigues

2016

Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Self Cite

View full text Add to dashboard Cite

Concurrency bugs that stem from schedule-dependent branches are hard to understand and debug, because their root causes imply not only different event orderings, but also changes in the control-flow between failing and non-failing executions. We present Cortex: a system that helps exposing and understanding concurrency bugs that result from schedule-dependent branches, without relying on information from failing executions. Cortex preemptively exposes failing executions by perturbing the order of events and controlflow behavior in non-failing schedules from production runs of a program. By leveraging this information from production runs, Cortex synthesizes executions to guide the search for failing schedules. Production-guided search helps cope with the large execution search space by targeting failing executions that are similar to observed non-failing executions. Evaluation on popular benchmarks shows that Cortex is able to expose failing schedules with only a few perturbations to non-failing executions, and takes a practical amount of time.

show abstract

“…Note that, in order to ensure deterministic error replay, one should log all sources of non-determinism of the program, and not solely user input. On the other hand, dealing with other sources of non-determinism is out of the scope of the REAP system for the following two main reasons: i) different types of non-deterministic sources could be tackled using dedicated solutions aimed at supporting deterministic replay [23,24]; ii) from the privacy perspective, which represents the focus of our work, user inputs are arguably the most critical sources of non-determinism. Our prototype of REAP supports multi-threaded programs (using the Java Pathfinder extension jpf-concurrent [25]) but, at this time, does not handle the reproduction of concurrency bugs.…”

Section: Prototype Implementationmentioning

confidence: 99%

REAP: Reporting Errors Using Alternative Paths

Matos

Garcia

Romano

2014

Programming Languages and Systems

Self Cite

View full text Add to dashboard Cite

Software testing is often unable to detect all program flaws. These bugs are most commonly reported to programmers in error reports containing core dumps and/or execution traces that frequently reveal users' private information without providing all necessary information for effective debugging. Hence, these mechanisms are sparsely used due to users' data privacy concerns. This paper presents REAP, a new fault replication method, which allows for enhancing privacy protection while still providing software developers with the 'steps-to-reproduce" errors. REAP uses symbolic execution and randomized search heuristics to identify alternative execution paths leading to an observed error. We evaluated REAP using a testbed including real bugs of popular, large scale applications. The results show the high effectiveness of REAP in anonymizing user input: on average, REAP reveals only 16.78% of the bits in the original input, achieving an average residue (the number of common characters in the original and anonymized input) of 15.07%. Our evaluation also highlights that REAP significantly outperforms state of the art techniques in terms of achieved privacy and/or scalability.

show abstract

Lightweight cooperative logging for fault replication in concurrent programs

Cited by 13 publications

References 25 publications

Concurrency Debugging with Differential Schedule Projections

Concurrency Debugging with Differential Schedule Projections

Production-guided concurrency debugging

REAP: Reporting Errors Using Alternative Paths

Contact Info

Product

Resources

About