Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022
DOI: 10.18653/v1/2022.acl-long.390
|View full text |Cite
|
Sign up to set email alerts
|

Neural Machine Translation with Phrase-Level Universal Visual Representations

Abstract: Multimodal machine translation (MMT) aims to improve neural machine translation (NMT) with additional visual information, but most existing MMT methods require paired input of source sentence and image, which makes them suffer from shortage of sentence-image pairs. In this paper, we propose a phrase-level retrieval-based method for MMT to get visual information for the source input from existing sentence-image data sets so that MMT can break the limitation of paired sentence-image input. Our method performs re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 18 publications
(5 citation statements)
references
References 25 publications
0
5
0
Order By: Relevance
“…The gated fusion mechanism is a popular technique for fusing representations from different sources (Wu et al, 2021;Fang and Feng, 2022;Lin et al, 2020;. The fused output is a weighted sum between the text representation and the selective attention output, in which the weight is controlled with the gate λ.…”
Section: Gated Fusionmentioning
confidence: 99%
“…The gated fusion mechanism is a popular technique for fusing representations from different sources (Wu et al, 2021;Fang and Feng, 2022;Lin et al, 2020;. The fused output is a weighted sum between the text representation and the selective attention output, in which the weight is controlled with the gate λ.…”
Section: Gated Fusionmentioning
confidence: 99%
“…Since it is difficult to train an end-to-end ST model directly, some training techniques like pretraining (Weiss et al, 2017;Berard et al, 2018;Bansal et al, 2019;Stoian et al, 2020;Wang et al, 2020b;Dong et al, 2021a;Alinejad and Sarkar, 2020;Zheng et al, 2021b;, multi-task learning (Le et al, 2020;Vydana et al, 2021;Tang et al, 2021b;Ye et al, 2021;Tang et al, 2021a), curriculum learning (Kano et al, 2017;Wang et al, 2020c), and meta-learning (Indurthi et al, 2020) have been applied. Recent work has introduced mixup on machine translation (Zhang et al, 2019b;Guo et al, 2022;Fang and Feng, 2022), sentence classification (Chen et al, 2020;Jindal et al, 2020;Sun et al, 2020), multilingual understanding , and speech recognition (Medennikov et al, 2018;Sun et al, 2021;Lam et al, 2021a;Meng et al, 2021), and obtained enhancements.…”
Section: Can the Final Model Still Perform Mt Task?mentioning
confidence: 99%
“…However, these approaches often rely on time-consuming parsing tools to extract phrases. For the learning of phrase representations, averaging token representations is commonly used (Fang and Feng, 2022;Ma et al, 2022). While this method is simple, it fails to effectively capture the overall semantics of the phrases, thereby impacting the model performance.…”
Section: Introductionmentioning
confidence: 99%