Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering 2014
DOI: 10.1145/2635868.2635875
|View full text |Cite
|
Sign up to set email alerts
|

On the localness of software

Abstract: The n-gram language model, which has its roots in statistical natural language processing, has been shown to successfully capture the repetitive and predictable regularities ("naturalness") of source code, and help with tasks such as code suggestion, porting, and designing assistive coding devices. However, we show in this paper that this natural-language-based model fails to exploit a special property of source code: localness. We find that human-written programs are localized: they have useful local regulari… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
200
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
1
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 232 publications
(203 citation statements)
references
References 51 publications
3
200
0
Order By: Relevance
“…Following this work, language models have been used to good effect in code suggestion [22,48,53,15], cross-language porting [38,37,39,24], coding standards [2], idiom mining [3], and code deobfuscation [47]. Since language models are useful in these tasks, ⇤ Baishakhi Ray and Vincent Hellendoorn are both first authors, and contributed equally to the work.…”
Section: Introductionmentioning
confidence: 99%
“…Following this work, language models have been used to good effect in code suggestion [22,48,53,15], cross-language porting [38,37,39,24], coding standards [2], idiom mining [3], and code deobfuscation [47]. Since language models are useful in these tasks, ⇤ Baishakhi Ray and Vincent Hellendoorn are both first authors, and contributed equally to the work.…”
Section: Introductionmentioning
confidence: 99%
“…Tu et al [28] however, argue that code tokenization is enough for the n-gram language models. They also argue that n-gram models will not be useful when a particular context is not present in the source code corpora used to train the model.…”
Section: Methods Call Recommendersmentioning
confidence: 99%
“…Tu et al [3] sought to confirm that software is localized 2 Working on top of the fact that software is natural, they sought to find that there are "local regularities [in software] than be captured and exploited." They found, empirically, that this is the case.…”
Section: Prior Workmentioning
confidence: 99%
“…Since Gamboge uses a simple n-gram model, extending its prediction backend with the cache language module developed by Tu et al [3] may be beneficial. Since they have already shown that in single-token contexts, it surpasses a bare n-gram language model in suggestion performance, it may also improve suggestion performance for multi-token predicting.…”
Section: Combination With Other Code Suggestion Enginesmentioning
confidence: 99%
See 1 more Smart Citation