Attention-based Multimodal Neural Machine Translation

Huang, Po-Yao; Liu, Frederick; Shiang, Sz-Rung; Oh, Jean; Dyer, Chris

doi:10.18653/v1/w16-2360

Cited by 154 publications

(137 citation statements)

References 8 publications

Supporting

Mentioning

135

Contrasting

Order By: Relevance

“…Supplementing these translation with information from the image provided only marginal improvements. For instance Huang et al (2016) re-ranked the translation output using image features and failed to achieve a higher METEOR score than the baseline. Similarly, systems developed for the WMT 2016 crosslingual image description multimodal task had access to one or more reference English descriptions of the image (in addition to the image itself) when attempting to generate a German caption, allowing them to use attention-based models that took advantage of both pieces of information.…”

Section: Related Workmentioning

confidence: 99%

Generating Image Descriptions using Multilingual Data

Jaffe¹

2017

Proceedings of the Second Conference on Machine Translation

View full text Add to dashboard Cite

In this paper we explore several neural network architectures for the WMT 2017 multimodal translation sub-task on multilingual image caption generation. The goal of the task is to generate image captions in German, using a training corpus of images with captions in both English and German. We explore several models which attempt to generate captions for both languages, ignoring the English output during evaluation. We compare the results to a baseline implementation which uses only the German captions for training and show significant improvement.

show abstract

Section: Related Workmentioning

confidence: 99%

Generating Image Descriptions using Multilingual Data

Jaffe¹

2017

Proceedings of the Second Conference on Machine Translation

View full text Add to dashboard Cite

show abstract

“…Hitschler et al (2016) use image information by pivoting it on an external image captioning corpora. Most systems that make use of NMT add the image feature information into either the NMT encoder or decoder (Huang et al, 2016;Hokamp and Calixto, 2016), similar to Elliott et al (2015) with various enhancements. Marginal improvements according to automatic evaluation metrics were found only for approaches using re-ranking.…”

Section: Introductionmentioning

confidence: 99%

Unraveling the Contribution of Image Captioning and Neural Machine Translation for Multimodal Machine Translation

Lala

Madhyastha

Wang

et al. 2017

The Prague Bulletin of Mathematical Linguistics

View full text Add to dashboard Cite

Recent work on multimodal machine translation has attempted to address the problem of producing target language image descriptions based on both the source language description and the corresponding image. However, existing work has not been conclusive on the contribution of visual information. This paper presents an in-depth study of the problem by examining the differences and complementarities of two related but distinct approaches to this task: textonly neural machine translation and image captioning. We analyse the scope for improvement and the effect of different data and settings to build models for these tasks. We also propose ways of combining these two approaches for improved translation quality.

show abstract

“…Multimodal NMT systems have been introduced (Elliott et al, 2015;Caglayan et al, 2016;Calixto et al, 2016;Huang et al, 2016) to incorporate visual information into NMT approaches, most of which condition the NMT on an image representation (typi-*P. Madhyastha and J. Wang contributed equally to this work.…”

Section: Introductionmentioning

confidence: 99%

“…Recent approaches to Multimodal NMT have used low level image features, including dense fully connected vectors and spatial convolutional representations from an image classification network (Elliott et al, 2015;Huang et al, 2016). They also incorporate attention mechanisms (Calixto et al, 2016).…”

Section: Introductionmentioning

confidence: 99%

Sheffield MultiMT: Using Object Posterior Predictions for Multimodal Machine Translation

Madhyastha¹,

Wang²,

Specia³

2017

Proceedings of the Second Conference on Machine Translation

View full text Add to dashboard Cite

This paper describes the University of Sheffield's submission to the WMT17 Multimodal Machine Translation shared task. We participated in Task 1 to develop an MT system to translate an image description from English to German and French, given its corresponding image. Our proposed systems are based on the state-of-the-art Neural Machine Translation approach. We investigate the effect of replacing the commonly-used image embeddings with an estimated posterior probability prediction for 1,000 object categories in the images.

show abstract

Attention-based Multimodal Neural Machine Translation

Cited by 154 publications

References 8 publications

Generating Image Descriptions using Multilingual Data

Generating Image Descriptions using Multilingual Data

Unraveling the Contribution of Image Captioning and Neural Machine Translation for Multimodal Machine Translation

Sheffield MultiMT: Using Object Posterior Predictions for Multimodal Machine Translation

Contact Info

Product

Resources

About