Michele Tufano scite author profile

This paper presents a novel end-to-end approach to program repair based on sequence-to-sequence learning. We devise, implement, and evaluate a technique, called SEQUENCER, for fixing bugs based on sequence-to-sequence learning on source code. This approach uses the copy mechanism to overcome the unlimited vocabulary problem that occurs with big code. Our system is data-driven; we train it on 35,578 samples, carefully curated from commits to open-source repositories. We evaluate SEQUENCER on 4,711 independent real bug fixes, as well on the Defects4J benchmark used in program repair research. SEQUENCER is able to perfectly predict the fixed line for 950/4,711 testing samples, and find correct patches for 14 bugs in Defects4J benchmark. SEQUENCER captures a wide range of repair operators without any domain-specific top-down design.Index Terms-program repair; machine learning. ! Zimin Chen is currently a PhD student at KTH Royal Institute of Technology. He also received the BS and MS degree in computer science from KTH. His research interest lies in the intersection between machine learning and software engineering, especially between automatic program repair and machine learning.Steve Kommrusch is currently a PhD candidate focused on machine learning at Colorado State University. He received his BS in computer engineering from University of Illinois in 1987 and his MS in EECS from MIT in 1989. From 1989 through 2017, he worked in industry at Hewlett-Packard, National Semiconductor, and Advanced Micro Devices. Steve holds over 30 patents in the fields of computer graphics algorithms, silicon simulation and debug techniques, and silicon performance and power management. His research interests include Program Equivalence, Program Repair, and Constructivist AI using machine learning. Electrical and Computer Engineering department. He is working on patternspecific languages and compilers for scientific computing, and has designed numerous approaches using optimizing compilation to effectively map applications to CPUs, GPUs, FPGAs and System-on-Chips. His work spans a variety of domains including compiler optimization design especially in the polyhedral compilation framework, high-level synthesis for FPGAs and SoCs, and distributed computing. Previously

show abstract

An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation

Tufano

Watson

Bavota

et al. 2019

ACM Trans. Softw. Eng. Methodol.

268

320

View full text Add to dashboard Cite

Millions of open-source projects with numerous bug fixes are available in code repositories. This proliferation of software development histories can be leveraged to learn how to fix common programming bugs. To explore such a potential, we perform an empirical study to assess the feasibility of using Neural Machine Translation techniques for learning bug-fixing patches for real defects. First, we mine millions of bug-fixes from the change histories of projects hosted on GitHub, in order to extract meaningful examples of such bug-fixes. Next, we abstract the buggy and corresponding fixed code, and use them to train an Encoder-Decoder model able to translate buggy code into its fixed version. In our empirical investigation we found that such a model is able to fix thousands of unique buggy methods in the wild. Overall, this model is capable of predicting fixed patches generated by developers in 9-50% of the cases, depending on the number of candidate patches we allow it to generate. Also, the model is able to emulate a variety of different Abstract Syntax Tree operations and generate candidate patches in a split second.

show abstract

Deep learning code fragments for code clone detection

et al. 2016

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Michele Tufano

SEQUENCER: Sequence-to-Sequence Learning for End-to-End Program Repair

An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation

Deep learning code fragments for code clone detection

Contact Info

Product

Resources

About