2015 IEEE/ACM 12th Working Conference on Mining Software Repositories 2015
DOI: 10.1109/msr.2015.38
|View full text |Cite
|
Sign up to set email alerts
|

Toward Deep Learning Software Repositories

Abstract: Deep learning subsumes algorithms that automatically learn compositional representations. The ability of these models to generalize well has ushered in tremendous advances in many fields such as natural language processing (NLP). Recent research in the software engineering (SE) community has demonstrated the usefulness of applying NLP techniques to software corpora. Hence, we motivate deep learning for software language modeling, highlighting fundamental differences between state-of-the-practice software langu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
176
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 214 publications
(176 citation statements)
references
References 49 publications
0
176
0
Order By: Relevance
“…White et al (White et al, 2015) trained RNNs on source code and showed their practicality in code completion. Similarly, Raychev et al (Raychev et al, 2014) used RNNs in code completion to synthesize method call chains in Java code.…”
Section: Prior Workmentioning
confidence: 99%
See 3 more Smart Citations
“…White et al (White et al, 2015) trained RNNs on source code and showed their practicality in code completion. Similarly, Raychev et al (Raychev et al, 2014) used RNNs in code completion to synthesize method call chains in Java code.…”
Section: Prior Workmentioning
confidence: 99%
“…For this task, we employed long short-term memory (LSTM) recurrent neural networks, as they were successfully used by prior work in predicting tokens from source code (Raychev et al, 2014;White et al, 2015). Unlike the prior work, we have trained two models-the forwards model, given a prefix context and returning the distribution of the next token; and the backwards model, given a suffix context and returning the distribution of the previous token.…”
Section: Training the Lstmsmentioning
confidence: 99%
See 2 more Smart Citations
“…Based on empirical work done by White et al (White et al, 2015) we chose a context length τ of 20 tokens. This corresponds to an n-gram length of 21 tokens, as an n-gram traditionally includes both the context and the adjacent token.…”
Section: Training the Lstmsmentioning
confidence: 99%