Detecting argument selection defects

Rice, Andrew; Aftandilian, Edward; Jaspan, Ciera; Johnston, Emily; Pradel, Michael; Arroyo-Paredes, Yulissa

doi:10.1145/3133928

Cited by 43 publications

(49 citation statements)

References 36 publications

(28 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In contrast, seq and sequoia are semantically dissimilar because they refer to different concepts, even though they share a common prefix of characters. As illustrated by these examples, semantic similarity does not always correspond to lexical similarity, as considered by prior work Pradel and Gross 2011;Rice et al 2017], and may even exist cross type boundaries. To enable a machine learning-based bug detector to reason about identifiers, we require a representation of identifiers that preserves semantic similarities.…”

Section: Embeddings For Identifiers and Literalsmentioning

confidence: 87%

“…It is important to note that bug detectors built with DeepBugs do not require any heuristics or manually designed filters of warnings, as commonly used in existing name-based bug detectors Pradel and Gross 2011;Rice et al 2017]. For example, the start-of-the-art bug detector to detect accidentally swapped function arguments relies on a hard-coded list of function names for which swapping the arguments is expected, such as flip, transpose, or reverse [Rice et al 2017]. Instead of hard-coding such heuristics, which is time-consuming and likely incomplete, learned name-based bug detectors infer these kinds of exceptions from the training data.…”

Section: Training and Querying A Bug Detectormentioning

confidence: 99%

“…The bug detectors address a diverse set of programming mistakes: accidentally swapped function arguments, incorrect binary operators, and incorrect operands in binary expressions. While the first bug pattern has been the target of previous work for statically typed languages Pradel and Gross 2011;Rice et al 2017], we are not aware of a name-based bug detector for the other two bug patterns. Implementing new bug detectors is straightforward, and we envision future work to create more instances of our framework, e.g., based on bug patterns mined from version histories [Brown et al 2017;Hanam et al 2016].…”

Section: Name-based Bug Detectorsmentioning

confidence: 99%

“…Swapping the argument leads to an incorrect error message when the test fails, which makes debugging unnecessarily hard. Google developers consider this kind of mistake a bug [Rice et al 2017]. assertEquals ( tree .…”

Section: Examples Of Bugsmentioning

confidence: 99%

“…To make decisions about programs, e.g., to report a piece of code as likely incorrect, existing name-based analyses rely on manually designed algorithms that use hard-coded patterns and carefully tuned heuristics. For example, a name-based analysis that has been recently deployed at Google [Rice et al 2017] comes with various heuristics to increase the number of detected bugs and to decrease the number of false positives. Designing and fine-tuning such heuristics imposes a significant human effort that is difficult to reuse across different analyses and different classes of bugs.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

DeepBugs: a learning approach to name-based bug detection

Pradel

Sen

2018

Proc. ACM Program. Lang.

Self Cite

294

229

View full text Add to dashboard Cite

Natural language elements in source code, e.g., the names of variables and functions, convey useful information. However, most existing bug detection tools ignore this information and therefore miss some classes of bugs. The few existing name-based bug detection approaches reason about names on a syntactic level and rely on manually designed and tuned algorithms to detect bugs. This paper presents DeepBugs, a learning approach to name-based bug detection, which reasons about names based on a semantic representation and which automatically learns bug detectors instead of manually writing them. We formulate bug detection as a binary classification problem and train a classifier that distinguishes correct from incorrect code. To address the challenge that effectively learning a bug detector requires examples of both correct and incorrect code, we create likely incorrect code examples from an existing corpus of code through simple code transformations. A novel insight learned from our work is that learning from artificially seeded bugs yields bug detectors that are effective at finding bugs in real-world code. We implement our idea into a framework for learning-based and name-based bug detection. Three bug detectors built on top of the framework detect accidentally swapped function arguments, incorrect binary operators, and incorrect operands in binary operations. Applying the approach to a corpus of 150,000 JavaScript files yields bug detectors that have a high accuracy (between 89% and 95%), are very efficient (less than 20 milliseconds per analyzed file), and reveal 102 programming mistakes (with 68% true positive rate) in real-world code.

show abstract

Section: Embeddings For Identifiers and Literalsmentioning

confidence: 87%