Depth Growing for Neural Machine Translation

Wu, Lijun; Wang, Yiren; Xia, Yingce; Tian, Fei; Gao, Fei; Qin, Tao; Lai, Jianhuang; Liu, Tie-Yan

doi:10.18653/v1/p19-1558

Cited by 63 publications

(74 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The overfitted discriminator will give biased signals to the generator and make it update incorrectly, leading to the instability of the generator training. Wu et al (2017) found that combining adversarial training objective with MLE can significantly improve the stability of generator training, which is also reported in language model and neural dialogue generation (Lamb et al, 2016;Li et al, 2017). Actually, although this method leverages real translation signal to guide the generator and alleviate the effect of overfitted discriminator, it cannot deal with the inadequate training problem of the discriminator, which essentially plays a more important role in GAN training.…”

Section: Generative Adversarial Networkmentioning

confidence: 85%

“…Baseline Model MIXER (Ranzato et al, 2015) 20.10 21.81 MRT (Shen et al, 2016) -25.84 BSO (Wiseman and Rush, 2016) 24.03 26.36 Adversarial-NMT (Wu et al, 2017) -27.94 A-C (Bahdanau et al, 2016) 27.56 28.53 Softmax-Q (Ma et al, 2017) 27.66 28.77 Adversarial-NMT* 27.63 28.03 Table 1: Comparison with previous work on IWSLT2014 German-English translation task. The "Baseline" means the performance of pretrained model used to warmly start training.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Bidirectional Generative Adversarial Networks for Neural Machine Translation

Zhang

Liu

Zhou

et al. 2018

Proceedings of the 22nd Conference on Computational Natural Language Learning

View full text Add to dashboard Cite

Generative Adversarial Network (GAN) has been proposed to tackle the exposure bias problem of Neural Machine Translation (NMT). However, the discriminator typically results in the instability of the GAN training due to the inadequate training problem: the search space is so huge that sampled translations are not sufficient for discriminator training. To address this issue and stabilize the GAN training, in this paper, we propose a novel Bidirectional Generative Adversarial Network for Neural Machine Translation (BGAN-NMT), which aims to introduce a generator model to act as the discriminator, whereby the discriminator naturally considers the entire translation space so that the inadequate training problem can be alleviated. To satisfy this property, generator and discriminator are both designed to model the joint probability of sentence pairs, with the difference that, the generator decomposes the joint probability with a source language model and a source-to-target translation model, while the discriminator is formulated as a target language model and a target-to-source translation model. To further leverage the symmetry of them, an auxiliary GAN is introduced and adopts generator and discriminator models of original one as its own discriminator and generator respectively. Two GANs are alternately trained to update the parameters. Experiment results on German-English and Chinese-English translation tasks demonstrate that our method not only stabilizes GAN training but also achieves significant improvements over baseline systems.

show abstract

Section: Generative Adversarial Networkmentioning

confidence: 85%

Section: Methodsmentioning

confidence: 99%

Bidirectional Generative Adversarial Networks for Neural Machine Translation

Zhang

Liu

Zhou

et al. 2018

Proceedings of the 22nd Conference on Computational Natural Language Learning

View full text Add to dashboard Cite

show abstract

“…A GAN usually contains two neural networks: a generator G and a discriminator D. G generates samples while D is trained to distinguish generated samples from true samples. By regarding the sequence generation as an action-taking problem in reinforcement learning, Li et al (2017) proposed to apply GAN to dialogue generation, in which the output of the discriminator is used as the reward for the generator's optimization. Work on the Safe Response Problem There is some existing work on the safe response problem.…”

Section: Duality True Queriesmentioning

confidence: 99%

“…The generator realizes that it generates low-quality samples but cannot figure out the good standard. To stabilize the training process, after each update with the combined gradient ∇ θqr G θqr or ∇ θrq G θrq , the generators are provided with real query-response pairs and are strengthened with maximum likelihood training, which is also known as Teacher Forcing (Li et al, 2017;…”

Section: Training Of Dalmentioning

confidence: 99%

“…MMIbidi: the mutual information method (Li et al, 2016), which first generates a N-best response set with p(r|q) and then reranks this response set with p(q|r) in inference. Adver-REIN: the adversarial method adopting REINFORCE algorithm (Li et al, 2017). GAN-AEL: the adversarial method with an approximate embedding layer to solve the non-differentiable problem (Xu et al, 2017).…”

Section: Experimental Settingsmentioning

confidence: 99%

See 1 more Smart Citation

Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation

2019

View full text Add to dashboard Cite

In this paper, we extend the persona-based sequence-to-sequence (Seq2Seq) neural network conversation model to a multi-turn dialogue scenario by modifying the state-ofthe-art hredGAN architecture to simultaneously capture utterance attributes such as speaker identity, dialogue topic, speaker sentiments and so on. The proposed system, phredGAN has a persona-based HRED generator (PHRED) and a conditional discriminator. We also explore two approaches to accomplish the conditional discriminator: (1) phredGAN a , a system that passes the attribute representation as an additional input into a traditional adversarial discriminator, and (2) phredGAN d , a dual discriminator system which in addition to the adversarial discriminator, collaboratively predicts the attribute(s) that generated the input utterance. To demonstrate the superior performance of phredGAN over the persona Seq2Seq model, we experiment with two conversational datasets, the Ubuntu Dialogue Corpus (UDC) and TV series transcripts from the Big Bang Theory and Friends. Performance comparison is made with respect to a variety of quantitative measures as well as crowd-sourced human evaluation. We also explore the trade-offs from using either variant of phredGAN on datasets with many but weak attribute modalities (such as with Big Bang Theory and Friends) and ones with few but strong attribute modalities (customer-agent interactions in Ubuntu dataset).

show abstract

A Differentiable Generative Adversarial Network for Open Domain Dialogue

Zorrilla

Velasco

Torres

2021

Lecture Notes in Electrical Engineering

View full text Add to dashboard Cite

This work presents a novel methodology to train open domain neural dialogue systems within the framework of Generative Adversarial Networks with gradient based optimization methods. We avoid the non-differentiability related to textgenerating networks approximating the word vector corresponding to each generated token via a top-k softmax. We show that a weighted average of the word vectors of the most probable tokens computed from the probabilities resulting of the top-k softmax leads to a good approximation of the word vector of the generated token. Finally we demonstrate through a human evaluation process that training a neural dialogue system via adversarial learning with this method successfully discourages it from producing generic responses. Instead it tends to produce more informative and variate ones.

show abstract

Depth Growing for Neural Machine Translation

Cited by 63 publications

References 17 publications

Bidirectional Generative Adversarial Networks for Neural Machine Translation

Bidirectional Generative Adversarial Networks for Neural Machine Translation

Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation

A Differentiable Generative Adversarial Network for Open Domain Dialogue

Contact Info

Product

Resources

About