2019 IEEE International Conference on Multimedia and Expo (ICME) 2019
DOI: 10.1109/icme.2019.00070
|View full text |Cite
|
Sign up to set email alerts
|

Improving Captioning for Low-Resource Languages by Cycle Consistency

Abstract: Improving the captioning performance on low-resource languages by leveraging English caption datasets has received increasing research interest in recent years. Existing works mainly fall into two categories: translation-based and alignment-based approaches. In this paper, we propose to combine the merits of both approaches in one unified architecture. Specifically, we use a pre-trained English caption model to generate high-quality English captions, and then take both the image and generated English captions … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 19 publications
0
4
0
Order By: Relevance
“…The two main strategies entail collecting captions in different languages for commonly used datasets (e.g. Chinese and Japanese captions for MS COCO images [200], [201], German captions for Flick30K [202]), or directly training multilingual captioning systems with unpaired captions [160], [199], [203], [204], [205], which requires specific evaluation protocols [206].…”
Section: Focusing On the Textual Outputmentioning
confidence: 99%
“…The two main strategies entail collecting captions in different languages for commonly used datasets (e.g. Chinese and Japanese captions for MS COCO images [200], [201], German captions for Flick30K [202]), or directly training multilingual captioning systems with unpaired captions [160], [199], [203], [204], [205], which requires specific evaluation protocols [206].…”
Section: Focusing On the Textual Outputmentioning
confidence: 99%
“…To extend captioning technology to non-English languages, we are starting to see some studies being reported. Some researchers have attempted to directly propose a captioning model on a target language while utilizing a pivot language, typically English, in which paired information is readily avail-able [12,23,34,39]. Nevertheless, the straightforward approach remains to collect image-caption pairs in the target language (e.g., French [30], German [10], or Chinese [24]).…”
Section: Related Workmentioning
confidence: 99%
“…Scene graphs [45], vmCAN [59], Graph-align [136], Know more say less [137], GCN-LSTM [15], SGAE [16], StructCap [138], GCH [139], GIN [140], Textual-GCNs [141], CSMN [31], CMMN [64], ReGAT [142], Out-of-the-box [143], Graph VQA [144], GERG [145], VKMN [146], MAN-VQA [147], DMN+ [148], MSCQA [116], SCH-GAN [82], CBT [149], SCST [17], CAVP [18], SR-PL [19], SMem-VQA [150], ODA [151], AOA [152], Up-Down [32], Attention-aware [79], BSSAN [62], CRAN [153], CBP [36], SOT [5], PAGNet [34], MirrorGAN [154], DAI [155], T2I2T [11], CCGAN [13], C4Synth [156], Cycle-Attn+ [157], Coupled CycleGAN [158], TCCM [159], CycleMatch [160], VQA-Rephrasings [161], iQAN [162], MLAN...…”
Section: Recent Advances In Deep Multimodal Feature Learningmentioning
confidence: 99%