Identifying patch correctness in test-based program repair

Xiong, Yingfei; Liu, Xinyuan; Zeng, Muhan; Zhang, Lu; Huang, Gang

doi:10.1145/3180155.3180182

Cited by 138 publications

(144 citation statements)

References 57 publications

Supporting

Mentioning

143

Contrasting

Order By: Relevance

“…It would also be interesting to apply our approach that learns patch embeddings to other related problems, e.g. identification of valid/invalid patches in automated program repair [69], assignment of patches to developers for code review [62], [71], etc. Dataset and Code.…”

Section: Resultsmentioning

confidence: 99%

PatchNet: Hierarchical Deep Learning-Based Stable Patch Identification for the Linux Kernel

Hoang

Lawall²,

Tian³

et al. 2021

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

Linux kernel stable versions serve the needs of users who value stability of the kernel over new features. The quality of such stable versions depends on the initiative of kernel developers and maintainers to propagate bug fixing patches to the stable versions. Thus, it is desirable to consider to what extent this process can be automated. A previous approach relies on words from commit messages and a small set of manually constructed code features. This approach, however, shows only moderate accuracy. In this paper, we investigate whether deep learning can provide a more accurate solution. We propose PatchNet, a hierarchical deep learning-based approach capable of automatically extracting features from commit messages and commit code and using them to identify stable patches. PatchNet contains a deep hierarchical structure that mirrors the hierarchical and sequential structure of commit code, making it distinctive from the existing deep learning models on source code. Experiments on 82,403 recent Linux patches confirm the superiority of PatchNet against various state-of-the-art baselines, including the one recently-adopted by Linux kernel maintainers.

show abstract

Section: Resultsmentioning

confidence: 99%

PatchNet: Hierarchical Deep Learning-Based Stable Patch Identification for the Linux Kernel

Hoang

Lawall²,

Tian³

et al. 2021

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

show abstract

“…Defects4J is a manual curated dataset widely used in the APR literature [12,18,73,84,90,91]. Since Defects4J was not initially built for APR, the real order of precedence between the bug report, the patch and the test case is being overlooked by the dataset users.…”

Section: Fault Localization Challengesmentioning

confidence: 99%

“…The repair community has started to reflect on the acceptability [26,63] and correctness [76,91] of the patches generated by APR tools.…”

Section: Patch Validation In Practicementioning

confidence: 99%

“…Since then, APR systems are evaluated with the precision of generating correct patches [19,49,84,92]. Recently, researchers explore automated frameworks that can identify patch correctness for APR systems automatically [35,91]. In this paper, our approach validates generated patches with regression test suites since fail-inducing test cases are readily available for most of bugs as described in Section 2.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

iFixR: bug report driven program repair

Koyuncu

Liu

Bissyandé

et al. 2019

Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of

View full text Add to dashboard Cite

Issue tracking systems are commonly used in modern software development for collecting feedback from users and developers. An ultimate automation target of software maintenance is then the systematization of patch generation for user-reported bugs. Although this ambition is aligned with the momentum of automated program repair, the literature has, so far, mostly focused on generate-andvalidate setups where fault localization and patch generation are driven by a well-defined test suite. On the one hand, however, the common (yet strong) assumption on the existence of relevant test cases does not hold in practice for most development settings: many bugs are reported without the available test suite being able to reveal them. On the other hand, for many projects, the number of bug reports generally outstrips the resources available to triage them. Towards increasing the adoption of patch generation tools by practitioners, we investigate a new repair pipeline, iFixR, driven by bug reports: (1) bug reports are fed to an IR-based fault localizer;(2) patches are generated from fix patterns and validated via regression testing; (3) a prioritized list of generated patches is proposed to developers. We evaluate iFixR on the Defects4J dataset, which we enriched (i.e., faults are linked to bug reports) and carefullyreorganized (i.e., the timeline of test-cases is naturally split). iFixR generates genuine/plausible patches for 21/44 Defects4J faults with its IR-based fault localizer. iFixR accurately places a genuine/plausible patch among its top-5 recommendation for 8/13 of these faults (without using future test cases in generation-and-validation). CCS CONCEPTS• Software and its engineering → Software verification and validation; Software testing and debugging.

show abstract

“…If the patch does not pass the validation, the second and third steps will be repeated until a valid patch is found or a predefined limitation is reached, e.g., the execution time. Over the years, many studies have been conducted with the aim to better identify the fault location [10][11][12][13][14][15][16][17][18][19], advance the patch generation process [2][3][4][5][6][7][8][9][20][21][22][23][24][25][26][27][28][29], and enhance the assessment of patch correctness [30][31][32][33][34]. The scope of this paper belongs to the last one.…”

Section: Introductionmentioning

confidence: 99%

How Different Is It Between Machine-Generated and Developer-Provided Patches? : An Empirical Study on the Correct Patches Generated by Automated Program Repair Techniques

Wang

Wen

Chen

et al. 2019

2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)

View full text Add to dashboard Cite

Background: Over the years, Automated Program Repair (APR) has attracted much attention from both academia and industry since it can reduce the costs in fixing bugs. However, how to assess the patch correctness remains to be an open challenge. Two widely adopted ways to approach this challenge, including manually checking and validating using automated generated tests, are biased (i.e., suffering from subjectivity and low precision respectively). Aim: To address this concern, we propose to conduct an empirical study towards understanding the correct patches that are generated by existing state-of-the-art APR techniques, aiming at providing guidelines for future assessment of patches. Method: To this end, we first present a Literature Review (LR) on the reported correct patches generated by recent techniques on the Defects4J benchmark and collect 177 correct patches after a process of sanity check. We investigate how these machine-generated correct patches achieve semantic equivalence, but syntactic difference compared with developerprovided ones, how these patches distribute in different projects and APR techniques, and how the characteristics of a bug affect the patches generated for it. Results: Our main findings include 1) we do not need to fix bugs exactly like how developers do since we observe that 25.4% (45/177) of the correct patches generated by APR techniques are syntactically different from developerprovided ones; 2) the distribution of machine-generated correct patches diverges for the aspects of Defects4J projects and APR techniques; and 3) APR techniques tend to generate patches that are different from those by developers for bugs with large patch sizes. Conclusion: Our study not only verifies the conclusions from previous studies but also highlights implications for future study towards assessing patch correctness. Keywords-Automated Program Repair; Defects4J; patch correctness assessment.RQ1 How do machine-generated correct patches differ from developer-provided ones?RQ2 How do different types of patches distribute? RQ3 Do APR tools tend to generate correct patches but different from the developer-provided ones for bugs with certain characteristics?A patch is generated based on the buggy location identified by fault localization techniques (i.e., denoted as edit point in this study) with certain code modifications. Based on this, the differences between patches can be distinguished in terms of two aspects, edits points and code modifications. To answer RQ1, we compare the collected patches with developerprovided ones and classify them into four types based on the aforementioned two aspects. We further investigate how the patches that are syntactically different from developer-provided ones achieve semantic equivalence. In RQ2, we investigate the distribution of patches from two aspects (i.e., different De-fects4J projects and APR techniques) and observe that fault localization is critical for generating correct patches for bugs in three projects of Defects4J. In RQ3, we aim at investigating whethe...

show abstract

Identifying patch correctness in test-based program repair

Cited by 138 publications

References 57 publications

PatchNet: Hierarchical Deep Learning-Based Stable Patch Identification for the Linux Kernel

PatchNet: Hierarchical Deep Learning-Based Stable Patch Identification for the Linux Kernel

iFixR: bug report driven program repair

How Different Is It Between Machine-Generated and Developer-Provided Patches? : An Empirical Study on the Correct Patches Generated by Automated Program Repair Techniques

Contact Info

Product

Resources

About