Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 2017
DOI: 10.18653/v1/d17-1151
|View full text |Cite
|
Sign up to set email alerts
|

Massive Exploration of Neural Machine Translation Architectures

Abstract: Neural Machine Translation (NMT) has shown remarkable progress over the past few years, with production systems now being deployed to end-users. As the field is moving rapidly, it has become unclear which elements of NMT architectures have a significant impact on translation quality. In this work, we present a large-scale analysis of the sensitivity of NMT architectures to common hyperparameters. We report empirical results and variance numbers for several hundred experimental runs, corresponding to over 250,0… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

11
245
0
3

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 375 publications
(268 citation statements)
references
References 28 publications
11
245
0
3
Order By: Relevance
“…While in their large-scale NMT experiments (Britz et al, 2017) use word-based sequences, in our case we use character-based ones. This simply involves changing "delimiter" option in configuration files.…”
Section: Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…While in their large-scale NMT experiments (Britz et al, 2017) use word-based sequences, in our case we use character-based ones. This simply involves changing "delimiter" option in configuration files.…”
Section: Modelmentioning
confidence: 99%
“…Contrary to word-based sequences, we use character-based sequences for generating grammatically correct and natural utterances. (Britz et al, 2017), provides an overview of the framework. While many options are configurable (number of layers, unidirectional vs bidirectional encoder, additive vs multiplicative attention mechanism, GRU (Cho et al, 2014) vs LSTM cells (Hochreiter and Schmidhuber, 1997), etc.…”
Section: Modelmentioning
confidence: 99%
“…The implementation of the described Sequence to Sequence model has been done using tf-seq2seq framework (Britz et al, 2017) for Tensorflow (Abadi et al, 2016). The experiment was conducted after splitting the training set and the test set in the ratio 7 : 3.…”
Section: Sequence To Sequence Modelmentioning
confidence: 99%
“…Recently, Britz et al (2017) have released a paper about exploring the hyper-parameters of NMT. This work is similar to our paper in the terms of finding the better hyper-parameters by doing a large number of experiments and deriving empirical conclusions.…”
Section: Related Workmentioning
confidence: 99%
“…While this can lead to much faster convergence, the resulting models are shown to slightly underperform compared to annealing SGD . However, Adam's speed and reputation of generally being "good enough" have made it a popular choice for researchers and NMT toolkit authors 6 (Arthur et al, 2016;Lee et al, 2016;Britz et al, 2017;Sennrich et al, 2017).…”
Section: Introductionmentioning
confidence: 99%