Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers 2016
DOI: 10.18653/v1/w16-2359
|View full text |Cite
|
Sign up to set email alerts
|

DCU-UvA Multimodal MT System Report

Abstract: We present a doubly-attentive multimodal machine translation model. Our model learns to attend to source language and spatial-preserving CONV 5,4 visual features as separate attention mechanisms in a neural translation model. In image description translation experiments (Task 1), we find an improvement of 2.3 Meteor points compared to initialising the hidden state of the decoder with only the FC 7 features and 2.9 Meteor points compared to a text-only neural machine translation baseline, confirming the useful … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
45
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 38 publications
(45 citation statements)
references
References 5 publications
0
45
0
Order By: Relevance
“…Most existing work obtain the best results by combining the penultimate layer of the CNN (via concatenation, summation, etc.) with the final state of the source sentence representation and using it to initialize the target RNN (Caglayan et al, 2016;Calixto et al, 2016;Huang et al, 2016). Recent work also explores an attention mechanism where they use lower level CNN features of the images, such as a convolutional layer, and condition the source and the target sentences on the image features (Calixto et al, 2016;Calixto et al, 2017).…”
Section: Multimodal Machine Translation Approachesmentioning
confidence: 99%
See 1 more Smart Citation
“…Most existing work obtain the best results by combining the penultimate layer of the CNN (via concatenation, summation, etc.) with the final state of the source sentence representation and using it to initialize the target RNN (Caglayan et al, 2016;Calixto et al, 2016;Huang et al, 2016). Recent work also explores an attention mechanism where they use lower level CNN features of the images, such as a convolutional layer, and condition the source and the target sentences on the image features (Calixto et al, 2016;Calixto et al, 2017).…”
Section: Multimodal Machine Translation Approachesmentioning
confidence: 99%
“…with the final state of the source sentence representation and using it to initialize the target RNN (Caglayan et al, 2016;Calixto et al, 2016;Huang et al, 2016). Recent work also explores an attention mechanism where they use lower level CNN features of the images, such as a convolutional layer, and condition the source and the target sentences on the image features (Calixto et al, 2016;Calixto et al, 2017). The intuition here is that the lower-level CNN features capture information about different areas of the images and an attention mechanism could learn to attend to specific regions while both encoding the source and decoding the target sentence.…”
Section: Multimodal Machine Translation Approachesmentioning
confidence: 99%
“…We then briefly discuss the doubly-attentive multi-modal NMT model we use in our experiments ( §2.3), which is comparable to the model evaluated by Calixto et al (2016) and further detailed and analysed in Calixto et al (2017a). …”
Section: Mt Models Evaluated In This Workmentioning
confidence: 99%
“…Multimodal NMT systems have been introduced (Elliott et al, 2015;Caglayan et al, 2016;Calixto et al, 2016;Huang et al, 2016) to incorporate visual information into NMT approaches, most of which condition the NMT on an image representation (typi-*P. Madhyastha and J. Wang contributed equally to this work.…”
Section: Introductionmentioning
confidence: 99%
“…They also incorporate attention mechanisms (Calixto et al, 2016). However, the effect of image features or the efficacy of the representational contribution is still an open research question.…”
Section: Introductionmentioning
confidence: 99%