Proceedings of the 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program Langu 2020
DOI: 10.1145/3416506.3423576
|View full text |Cite
|
Sign up to set email alerts
|

Statistical machine translation outperforms neural machine translation in software engineering: why and how

Abstract: Neural Machine Translation (NMT) is the current trend approach in Natural Language Processing (NLP) to solve the problem of automatically inferring the content of target language given the source language. The ability of NMT is to learn deep knowledge inside languages by deep learning approaches. However, prior works show that NMT has its own drawbacks in NLP and in some research problems of Software Engineering (SE). In this work, we provide a hypothesis that SE corpus has inherent characteristics that NMT wi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 37 publications
0
8
0
Order By: Relevance
“…NMT models: Before the era of NMT, Statistical Machine Translation (SMT) [15] was the most popular technique for software engineering (SE) problems, it still outperforms NMT in some SE problems [71]. However, since we are interested in the specific problem of code generation, we focus on NMT that has shown superior performance on public benchmarks [9], and that it is widely recognized as the premier method for the translation of different languages [83].…”
Section: Threats To Validitymentioning
confidence: 99%
“…NMT models: Before the era of NMT, Statistical Machine Translation (SMT) [15] was the most popular technique for software engineering (SE) problems, it still outperforms NMT in some SE problems [71]. However, since we are interested in the specific problem of code generation, we focus on NMT that has shown superior performance on public benchmarks [9], and that it is widely recognized as the premier method for the translation of different languages [83].…”
Section: Threats To Validitymentioning
confidence: 99%
“…NMT models Before the era of NMT, Statistical Machine Translation (SMT) Costa-Jussá and Farrús (2014) was the most popular technique for software engineering (SE) problems, it still outperforms NMT in some SE problems (Phan and Jannesari 2020). However, since we are interested in the specific problem of code generation, we focus on NMT that has shown superior performance on public benchmarks (Bojar et al 2016), and that it is widely recognized as the premier method for the translation of different languages (Wu et al 2016).…”
Section: Threats To Validitymentioning
confidence: 99%
“…Li et al [180] conducted experiments on two datasets to demonstrate the effectiveness of their approach consisting of an attention mechanism and a pointer mixture network on code completion tasks. Phan and Jannesari [245] used three corpus for their experiments-a large-scale corpus of English-German translation in nlp [201], the Conala corpus [356], which contains Python software documentation as 116,000 English sentences, and the msr 2013 corpus [18]. Schuster et al [275] used a public archive of GitHub from 2020 [1].…”
Section: Data Collectionmentioning
confidence: 99%
“…Gopalakrishnan et al [109] extracted relationships between topical concepts in the source code and the use of specific architectural developer tactics in that code. Phan and Jannesari [245] used machine translation to learn the mapping from prefixes to code tokens for code suggestion. They extracted the tokens from the documentation of the source code.…”
Section: Data Collectionmentioning
confidence: 99%
See 1 more Smart Citation