Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers 2016
DOI: 10.18653/v1/w16-2360
|View full text |Cite
|
Sign up to set email alerts
|

Attention-based Multimodal Neural Machine Translation

Abstract: Building models that take advantage of the hierarchical structure of language without a priori annotation is a longstanding goal in natural language processing. We introduce such a model for the task of machine translation, pairing a recurrent neural network grammar encoder with a novel atten-tional RNNG decoder and applying policy gradient reinforcement learning to induce unsupervised tree structures on both the source and target. When trained on character-level datasets with no explicit segmentation or parse… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
135
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 154 publications
(137 citation statements)
references
References 8 publications
2
135
0
Order By: Relevance
“…Supplementing these translation with information from the image provided only marginal improvements. For instance Huang et al (2016) re-ranked the translation output using image features and failed to achieve a higher METEOR score than the baseline. Similarly, systems developed for the WMT 2016 crosslingual image description multimodal task had access to one or more reference English descriptions of the image (in addition to the image itself) when attempting to generate a German caption, allowing them to use attention-based models that took advantage of both pieces of information.…”
Section: Related Workmentioning
confidence: 99%
“…Supplementing these translation with information from the image provided only marginal improvements. For instance Huang et al (2016) re-ranked the translation output using image features and failed to achieve a higher METEOR score than the baseline. Similarly, systems developed for the WMT 2016 crosslingual image description multimodal task had access to one or more reference English descriptions of the image (in addition to the image itself) when attempting to generate a German caption, allowing them to use attention-based models that took advantage of both pieces of information.…”
Section: Related Workmentioning
confidence: 99%
“…Hitschler et al (2016) use image information by pivoting it on an external image captioning corpora. Most systems that make use of NMT add the image feature information into either the NMT encoder or decoder (Huang et al, 2016;Hokamp and Calixto, 2016), similar to Elliott et al (2015) with various enhancements. Marginal improvements according to automatic evaluation metrics were found only for approaches using re-ranking.…”
Section: Introductionmentioning
confidence: 99%
“…Multimodal NMT systems have been introduced (Elliott et al, 2015;Caglayan et al, 2016;Calixto et al, 2016;Huang et al, 2016) to incorporate visual information into NMT approaches, most of which condition the NMT on an image representation (typi-*P. Madhyastha and J. Wang contributed equally to this work.…”
Section: Introductionmentioning
confidence: 99%
“…Recent approaches to Multimodal NMT have used low level image features, including dense fully connected vectors and spatial convolutional representations from an image classification network (Elliott et al, 2015;Huang et al, 2016). They also incorporate attention mechanisms (Calixto et al, 2016).…”
Section: Introductionmentioning
confidence: 99%