Proceedings of the 38th International Conference on Software Engineering 2016
DOI: 10.1145/2884781.2884848
|View full text |Cite
|
Sign up to set email alerts
|

On the "naturalness" of buggy code

Abstract: Real software, the kind working programmers produce by the kLOC to solve real-world problems, tends to be "natural", like speech or natural language; it tends to be highly repetitive and predictable. Researchers have captured this naturalness of software through statistical models and used them to good effect in suggestion engines, porting tools, coding standards checkers, and idiom miners. This suggests that code that appears improbable, or surprising, to a good statistical language model is "unnatural" in so… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

3
141
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 206 publications
(144 citation statements)
references
References 68 publications
3
141
0
Order By: Relevance
“…Even though it has been noted that buggy code stands out compared to non-buggy code [Ray et al 2016], little work exists on automatically detecting bugs via machine learning. Murali et al train a recurrent neural network that probabilistically models sequences of API calls and then use it for finding incorrect API usages [Murali et al 2017].…”
Section: Machine Learning and Language Models For Analyzing Bugsmentioning
confidence: 99%
“…Even though it has been noted that buggy code stands out compared to non-buggy code [Ray et al 2016], little work exists on automatically detecting bugs via machine learning. Murali et al train a recurrent neural network that probabilistically models sequences of API calls and then use it for finding incorrect API usages [Murali et al 2017].…”
Section: Machine Learning and Language Models For Analyzing Bugsmentioning
confidence: 99%
“…We study simplicity since it is very useful to replace N methods with M ≪ N methods, especially when the results from the many are no better than the few. A bewildering array of new methods for software quality prediction are reported each year (some of which rely on intimidatingly complex mathematical methods) such as deep belief net learning [50], spectral-based clustering [55], and n-gram language models [46]. Ghotra et al list dozens of different data mining algorithms that might be used for defect predictors [16].…”
Section: Background and Related Work 21 Why Study Simplification?mentioning
confidence: 99%
“…This is done by aggregating those outputs from the descendants, i.e. calling t-lstm() recursively on the children nodes (lines [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26]. This function first obtains the embedding w t of the input AST node t (using ast2vec as discussed in Section 4.2).…”
Section: Defect Prediction Modelmentioning
confidence: 99%
“…[8]) and the line level (e.g. [24]). Since our approach is able to learn features at the code token level, it may work at those finer level of granularity.…”
Section: Related Work 71 Defect Predictionmentioning
confidence: 99%