Background: Over the years, Automated Program Repair (APR) has attracted much attention from both academia and industry since it can reduce the costs in fixing bugs. However, how to assess the patch correctness remains to be an open challenge. Two widely adopted ways to approach this challenge, including manually checking and validating using automated generated tests, are biased (i.e., suffering from subjectivity and low precision respectively). Aim: To address this concern, we propose to conduct an empirical study towards understanding the correct patches that are generated by existing state-of-the-art APR techniques, aiming at providing guidelines for future assessment of patches. Method: To this end, we first present a Literature Review (LR) on the reported correct patches generated by recent techniques on the Defects4J benchmark and collect 177 correct patches after a process of sanity check. We investigate how these machine-generated correct patches achieve semantic equivalence, but syntactic difference compared with developerprovided ones, how these patches distribute in different projects and APR techniques, and how the characteristics of a bug affect the patches generated for it. Results: Our main findings include 1) we do not need to fix bugs exactly like how developers do since we observe that 25.4% (45/177) of the correct patches generated by APR techniques are syntactically different from developerprovided ones; 2) the distribution of machine-generated correct patches diverges for the aspects of Defects4J projects and APR techniques; and 3) APR techniques tend to generate patches that are different from those by developers for bugs with large patch sizes. Conclusion: Our study not only verifies the conclusions from previous studies but also highlights implications for future study towards assessing patch correctness.
Keywords-Automated Program Repair; Defects4J; patch correctness assessment.RQ1 How do machine-generated correct patches differ from developer-provided ones?RQ2 How do different types of patches distribute? RQ3 Do APR tools tend to generate correct patches but different from the developer-provided ones for bugs with certain characteristics?A patch is generated based on the buggy location identified by fault localization techniques (i.e., denoted as edit point in this study) with certain code modifications. Based on this, the differences between patches can be distinguished in terms of two aspects, edits points and code modifications. To answer RQ1, we compare the collected patches with developerprovided ones and classify them into four types based on the aforementioned two aspects. We further investigate how the patches that are syntactically different from developer-provided ones achieve semantic equivalence. In RQ2, we investigate the distribution of patches from two aspects (i.e., different De-fects4J projects and APR techniques) and observe that fault localization is critical for generating correct patches for bugs in three projects of Defects4J. In RQ3, we aim at investigating whethe...