A Stochastic Decoder for Neural Machine Translation

Schulz, Philip; Aziz, Wilker; Cohn, Trevor

doi:10.18653/v1/p18-1115

Cited by 18 publications

(15 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Even though there has been growing interest in variational approaches to machine translation (Zhang et al, 2016;Schulz et al, 2018;Shah and Barber, 2018;Eikema and Aziz, 2019) and to tasks that integrate vision and language, e.g. image description generation (Pu et al, 2016;Wang et al, 2017), relatively little attention has been dedicated to variational models for multi-modal translation.…”

Section: Related Workmentioning

confidence: 99%

Latent Variable Model for Multi-modal Translation

Calixto¹,

Rios²,

Aziz³

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

In this work, we propose to model the interaction between visual and textual features for multi-modal neural machine translation (MMT) through a latent variable model. This latent variable can be seen as a multi-modal stochastic embedding of an image and its description in a foreign language. It is used in a target-language decoder and also to predict image features. Importantly, our model formulation utilises visual and textual inputs during training but does not require that images be available at test time. We show that our latent variable MMT formulation improves considerably over strong baselines, including a multi-task learning approach (Elliott and Kádár, 2017) and a conditional variational auto-encoder approach (Toyama et al., 2016). Finally, we show improvements due to (i) predicting image features in addition to only conditioning on them, (ii) imposing a constraint on the minimum amount of information encoded in the latent variable, and (iii) by training on additional target-language image descriptions (i.e. synthetic data).1 Code and pre-trained models will be released soon.

show abstract

Section: Related Workmentioning

confidence: 99%

Latent Variable Model for Multi-modal Translation

Calixto¹,

Rios²,

Aziz³

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

show abstract

“…Their formulation is a conditional deep generative model (Sohn et al, 2015) that does not model the source side of the data, where, rather than a fixed standard Gaussian, the latent model is itself parameterised and depends on the data. Schulz et al (2018) extend the model of with a Markov chain of latent variables, one per timestep, allowing the model to capture greater variability.…”

Section: Related Workmentioning

confidence: 99%

Auto-Encoding Variational Neural Machine Translation

Eikema¹,

Aziz²

2019

Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

Self Cite

View full text Add to dashboard Cite

We present a deep generative model of bilingual sentence pairs for machine translation. The model generates source and target sentences jointly from a shared latent representation and is parameterised by neural networks. We perform efficient training using amortised variational inference and reparameterised gradients. Additionally, we discuss the statistical implications of joint modelling and propose an efficient approximation to maximum a posteriori decoding for fast test-time predictions. We demonstrate the effectiveness of our model in three machine translation scenarios: indomain training, mixed-domain training, and learning from a mix of gold-standard and synthetic data. Our experiments show consistently that our joint formulation outperforms conditional modelling (i.e. standard neural machine translation) in all such scenarios.

show abstract

“…Some work puts their efforts on decoding stages, and form a group of beam search to encourage diversity (Vijayakumar et al, 2016), while others pay more attention to adversarial training (Shetty et al, 2017;. Within translation, our method is similar to Schulz et al (2018b), where they propose a MT system armed with variational inference to account for translation variations. Like us, their diversified generation is driven by latent variables.…”

Section: Related Workmentioning

confidence: 99%

“…A well recognized issue with SEQ2SEQ models is the lack of diversity in the generated translations. This issue is mostly attributed to the decoding algorithm (Li et al, 2016), and recently to the model (Zhang et al, 2016;Schulz et al, 2018a). The former direction has attempted to design diversity encouraging decoding algorithm, particularly beam search, as it generates translations sharing the majority of their tokens except a few trailing ones.…”

Section: Introductionmentioning

confidence: 99%

Sequence to Sequence Mixture Model for Diverse Machine Translation

He¹,

Haffari²,

Norouzi³

2018

Proceedings of the 22nd Conference on Computational Natural Language Learning

View full text Add to dashboard Cite

Sequence to sequence (SEQ2SEQ) models often lack diversity in their generated translations. This can be attributed to the limitation of SEQ2SEQ models in capturing lexical and syntactic variations in a parallel corpus resulting from different styles, genres, topics, or ambiguity of the translation process. In this paper, we develop a novel sequence to sequence mixture (S2SMIX) model that improves both translation diversity and quality by adopting a committee of specialized translation models rather than a single translation model. Each mixture component selects its own training dataset via optimization of the marginal loglikelihood, which leads to a soft clustering of the parallel corpus. Experiments on four language pairs demonstrate the superiority of our mixture model compared to a SEQ2SEQ baseline with standard or diversity-boosted beam search. Our mixture model uses negligible additional parameters and incurs no extra computation cost during decoding.

show abstract

A Stochastic Decoder for Neural Machine Translation

Cited by 18 publications

References 10 publications

Latent Variable Model for Multi-modal Translation

Latent Variable Model for Multi-modal Translation

Auto-Encoding Variational Neural Machine Translation

Sequence to Sequence Mixture Model for Diverse Machine Translation

Contact Info

Product

Resources

About