2021
DOI: 10.1109/tse.2019.2940179
|View full text |Cite
|
Sign up to set email alerts
|

SEQUENCER: Sequence-to-Sequence Learning for End-to-End Program Repair

Abstract: This paper presents a novel end-to-end approach to program repair based on sequence-to-sequence learning. We devise, implement, and evaluate a technique, called SEQUENCER, for fixing bugs based on sequence-to-sequence learning on source code. This approach uses the copy mechanism to overcome the unlimited vocabulary problem that occurs with big code. Our system is data-driven; we train it on 35,578 samples, carefully curated from commits to open-source repositories. We evaluate SEQUENCER on 4,711 independent r… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
375
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 233 publications
(377 citation statements)
references
References 52 publications
(90 reference statements)
2
375
0
Order By: Relevance
“…Our data generation tools along with documentation and detailed instructions for how to use them are available in a public GitHub repository 2 and the dataset is publicly available in Zenodo. 3…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Our data generation tools along with documentation and detailed instructions for how to use them are available in a public GitHub repository 2 and the dataset is publicly available in Zenodo. 3…”
Section: Methodsmentioning
confidence: 99%
“…The combined datasets are the CodRep dataset [4] and the Bugs2Fix dataset [26] resulting in 40,289 one-line bugs. These datasets are combined into a single dataset of one line bugs in [3]. Our datasets are of similar size consisting of 25,539 and 153,652 single-statement bugs.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Many works have taken advantage of the "naturalness" of software [44] to assist software engineering tasks, including code completion [76], improving code readability [2], program repair [20,78], identifying buggy code [75] and API migration [38], among many others [4]. These approaches analyze large amounts of source code, ranging from hundreds to thousands of software projects, building machine learning models of source code properties, inspired by techniques from natural language processing (NLP).…”
Section: Introductionmentioning
confidence: 99%
“…8 We categorize a defect type for the sampled code blocks based on how the defect was fixed in the bug-fixing commits. We use a taxonomy of Chen et al [10] which is summarized in Table 4. To ensure a consistent understanding of the taxonomy, the first four authors of this paper independently categorize defect types for the 30 hit and 30 missed defective blocks.…”
Section: (Rq4) What Kind Of Defects Can Be Identified By Our Line-dp?mentioning
confidence: 99%