Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1558
|View full text |Cite
|
Sign up to set email alerts
|

Depth Growing for Neural Machine Translation

Abstract: While very deep neural networks have shown effectiveness for computer vision and text classification applications, how to increase the network depth of neural machine translation (NMT) models for better translation quality remains a challenging problem. Directly stacking more blocks to the NMT model results in no improvement and even reduces performance. In this work, we propose an effective two-stage approach with three specially designed components to construct deeper NMT models, which results in significant… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
72
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 63 publications
(74 citation statements)
references
References 17 publications
1
72
1
Order By: Relevance
“…The overfitted discriminator will give biased signals to the generator and make it update incorrectly, leading to the instability of the generator training. Wu et al (2017) found that combining adversarial training objective with MLE can significantly improve the stability of generator training, which is also reported in language model and neural dialogue generation (Lamb et al, 2016;Li et al, 2017). Actually, although this method leverages real translation signal to guide the generator and alleviate the effect of overfitted discriminator, it cannot deal with the inadequate training problem of the discriminator, which essentially plays a more important role in GAN training.…”
Section: Generative Adversarial Networkmentioning
confidence: 85%
See 1 more Smart Citation
“…The overfitted discriminator will give biased signals to the generator and make it update incorrectly, leading to the instability of the generator training. Wu et al (2017) found that combining adversarial training objective with MLE can significantly improve the stability of generator training, which is also reported in language model and neural dialogue generation (Lamb et al, 2016;Li et al, 2017). Actually, although this method leverages real translation signal to guide the generator and alleviate the effect of overfitted discriminator, it cannot deal with the inadequate training problem of the discriminator, which essentially plays a more important role in GAN training.…”
Section: Generative Adversarial Networkmentioning
confidence: 85%
“…Baseline Model MIXER (Ranzato et al, 2015) 20.10 21.81 MRT (Shen et al, 2016) -25.84 BSO (Wiseman and Rush, 2016) 24.03 26.36 Adversarial-NMT (Wu et al, 2017) -27.94 A-C (Bahdanau et al, 2016) 27.56 28.53 Softmax-Q (Ma et al, 2017) 27.66 28.77 Adversarial-NMT* 27.63 28.03 Table 1: Comparison with previous work on IWSLT2014 German-English translation task. The "Baseline" means the performance of pretrained model used to warmly start training.…”
Section: Methodsmentioning
confidence: 99%
“…A GAN usually contains two neural networks: a generator G and a discriminator D. G generates samples while D is trained to distinguish generated samples from true samples. By regarding the sequence generation as an action-taking problem in reinforcement learning, Li et al (2017) proposed to apply GAN to dialogue generation, in which the output of the discriminator is used as the reward for the generator's optimization. Work on the Safe Response Problem There is some existing work on the safe response problem.…”
Section: Duality True Queriesmentioning
confidence: 99%
“…The generator realizes that it generates low-quality samples but cannot figure out the good standard. To stabilize the training process, after each update with the combined gradient ∇ θqr G θqr or ∇ θrq G θrq , the generators are provided with real query-response pairs and are strengthened with maximum likelihood training, which is also known as Teacher Forcing (Li et al, 2017;…”
Section: Training Of Dalmentioning
confidence: 99%
See 1 more Smart Citation