Haoye Tian scite author profile

How do we know a generated patch is correct? This is a key challenging question that automated program repair (APR) systems struggle to address given the incompleteness of available test suites. Our intuition is that we can triage correct patches by checking whether each generated patch implements code changes (i.e., behaviour) that are relevant to the bug it addresses. Such a bug is commonly specified by a failing test case. Towards predicting patch correctness in APR, we propose a novel yet simple hypothesis on how the link between the patch behaviour and failing test specifications can be drawn: similar failing test cases should require similar patches . We then propose BATS , an unsupervised learning-based approach to predict patch correctness by checking patch B ehaviour A gainst failing T est S pecification. BATS exploits deep representation learning models for code and patches: for a given failing test case, the yielded embedding is used to compute similarity metrics in the search for historical similar test cases to identify the associated applied patches, which are then used as a proxy for assessing the correctness of the APR-generated patches. Experimentally, we first validate our hypothesis by assessing whether ground-truth developer patches cluster together in the same way that their associated failing test cases are clustered. Then, after collecting a large dataset of 1,278 plausible patches (written by developers or generated by 32 APR tools), we use BATS to predict correct patches: BATS achieves AUC between 0.557 to 0.718 and recall between 0.562 and 0.854 in identifying correct patches. Our approach outperforms state-of-the-art techniques for identifying correct patches without the need for large labeled patch datasets; as is the case with machine learning-based approaches. While BATS is constrained by the availability of similar test cases, we show that it can still be complementary to existing approaches: when combined with a recent approach that relies on supervised learning, BATS improves the overall recall in detecting correct patches. We finally show that BATS is complementary to the state-of-the-art PATCH-SIM dynamic approach for identifying correct patches generated by APR tools.

show abstract

A Music Recommendation System Based on logistic regression and eXtreme Gradient Boosting

Tian

Cai

Wen

et al. 2019

View full text Add to dashboard Cite

Is this Change the Answer to that Problem?

Tian

Tang

Habib

et al. 2022

View full text Add to dashboard Cite

Where were the repair ingredients for Defects4j bugs?

et al. 2021

View full text Add to dashboard Cite

A significant body of automated program repair research has built approaches under the redundancy assumption. Patches are then heuristically generated by leveraging repair ingredients (change actions and donor code) that are found in code bases (either the buggy program itself or big code). For example, common change actions (i.e., fix patterns) are frequently mined offline and serve as an important ingredient for many patch generation engines. Although the repetitiveness of code changes has been studied in general, the literature provides little insight into the relationship between the performance of the repair system and the source code base where the change actions were mined. Similarly, donor code is another important repair ingredient to concretize patches guided by abstract patterns. Yet, little attention has been paid to where such ingredients can actually be found. Through a large scale empirical study on the execution results of 24 repair systems evaluated on realworld bugs from Defects4J, we provide a comprehensive view on the distribution of repair ingredients that are relevant for these bugs. In particular, we show that (1) a half of bugs cannot be fixed simply because the relevant repair ingredient is not available in the search space of donor code; (2) bugs that are correctly fixed by literature tools are mostly addressed with shallow change actions; (3) programs with little history of changes can benefit from mining change actions in other programs; (4) parts of donor code to repair a given bug can be found separately at different search locations; (5) bug-triggering test cases are a rich source for donor code search.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Haoye Tian

Evaluating representation learning of code changes for predicting patch correctness in program repair

Predicting Patch Correctness Based on the Similarity of Failing Test Cases

A Music Recommendation System Based on logistic regression and eXtreme Gradient Boosting

Is this Change the Answer to that Problem?

Where were the repair ingredients for Defects4j bugs?

Contact Info

Product

Resources

About