Code completion with statistical language models

RaychevVeselin,; VechevMartin,; YahavEran,

doi:10.1145/2666356.2594321

“…White et al (White et al, 2015) trained RNNs on source code and showed their practicality in code completion. Similarly, Raychev et al (Raychev et al, 2014) used RNNs in code completion to synthesize method call chains in Java code.…”

Section: Prior Workmentioning

confidence: 99%

“…The mental model of the programmer may be something like a language model for speech, but rather applied to code. Language models are typically applied to natural human utterances but they have also been successfully applied to software (Hindle et al, 2012;Raychev et al, 2014;White et al, 2015), and can be used to discover unexpected segments of tokens in source code (Campbell et al, 2014).…”

Section: Introductionmentioning

confidence: 99%

“…Thus GrammarGuru uses language models to capture code regularity or naturalness and then looks for irregular code (Campbell et al, 2014). Once the location of a potential error is found, code completion techniques that exploit language models (Hindle et al, 2012;Raychev et al, 2014;White et al, 2015) can be used to suggest possible fixes. Traditional parsers do not rely upon such information.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Finding and correcting syntax errors using recurrent neural networks

Santos

¹

,

Campbell

²

,

Hindle

³

et al. 2017

Preprint

View full text Add to dashboard Cite

Minor syntax errors are made by novice and experienced programmers alike; however, novice programmers lack the years of intuition that help them resolve these tiny errors. Standard LR parsers typically resolve syntax errors and their precise location poorly. We propose a methodology that helps locate where syntax errors occur, but also suggests possible changes to the token stream that can fix the error identified. This methodology finds syntax errors by checking if two language models "agree" on each token. If the models disagree, it indicates a possible syntax error; the methodology tries to suggest a fix by finding an alternative token sequence obtained from the models. We trained two LSTM (Long short-term memory) language models on a large corpus of JavaScript code collected from GitHub. The dual LSTM neural network model predicts the correct location of the syntax error 54.74% in its top 4 suggestions and produces an exact fix up to 35.50% of the time. The results show that this tool and methodology can locate and suggest corrections for syntax errors. Our methodology is of practical use to all programmers, but will be especially useful to novices frustrated with incomprehensible syntax errors.

show abstract

“…For this task, we employed long short-term memory (LSTM) recurrent neural networks, as they were successfully used by prior work in predicting tokens from source code (Raychev et al, 2014;White et al, 2015). Unlike the prior work, we have trained two models-the forwards model, given a prefix context and returning the distribution of the next token; and the backwards model, given a suffix context and returning the distribution of the previous token.…”

Section: Training the Lstmsmentioning

confidence: 99%

Finding and correcting syntax errors using recurrent neural networks

Santos

¹

,

Campbell

²

,

Hindle

³

et al. 2017

Preprint

View full text Add to dashboard Cite

Minor syntax errors are made by novice and experienced programmers alike; however, novice programmers lack the years of intuition that help them resolve these tiny errors. Standard LR parsers typically resolve syntax errors and their precise location poorly. We propose a methodology that helps locate where syntax errors occur, but also suggests possible changes to the token stream that can fix the error identified. This methodology finds syntax errors by checking if two language models "agree" on each token. If the models disagree, it indicates a possible syntax error; the methodology tries to suggest a fix by finding an alternative token sequence obtained from the models. We trained two LSTM (Long short-term memory) language models on a large corpus of JavaScript code collected from GitHub. The dual LSTM neural network model predicts the correct location of the syntax error 54.74% in its top 4 suggestions and produces an exact fix up to 35.50% of the time. The results show that this tool and methodology can locate and suggest corrections for syntax errors. Our methodology is of practical use to all programmers, but will be especially useful to novices frustrated with incomprehensible syntax errors.

show abstract

“…NLP techniques such as n-gram and recurrent neural network models were used to synthesize sequences of calls to some APIs, together with their arguments [10]. The combination of NLP and statistical reasoning may be used for other tasks such as the automatic creation of program test cases, default implementation of methods and functions, automatic classification of programs behaviors by using latent semantic analysis [6], and sophisticated code completion from program structures and source comments.…”

mentioning

confidence: 99%

Big Code: New Opportunities for Improving Software Construction

Ortín¹,

Escalada²,

Rodriguez-Prieto³

2016

JSW

View full text Add to dashboard Cite

An emerging research topic called big code has recently appeared. Big code is based on the idea that open source code repositories can be used to create new kind of programming tools and services to improve software reliability and construction. We discuss different fields of application of big code, and the key issues to implement tools aimed at improving software construction following this approach. We describe the existing works that have already used this idea to build tools for vulnerability detection, software deobfuscation, automatic code completion for API usage, and efficient querying using detailed source-code information. Then, we propose different fields of application and the key issues found. We identify eight different fields where big code may be applied, and describe different examples for each field. We also detect seven different issues that must be tackled when creating tools based on the big code approach.

show abstract

Code completion with statistical language models

Cited by 289 publications

References 30 publications

Finding and correcting syntax errors using recurrent neural networks

Finding and correcting syntax errors using recurrent neural networks

Finding and correcting syntax errors using recurrent neural networks

Big Code: New Opportunities for Improving Software Construction

Contact Info

Product

Resources

About