2021
DOI: 10.48550/arxiv.2104.00332
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

Abstract: Vision-and-language pre-training has achieved impressive success in learning multimodal representations between vision and language. To generalize this success to non-English languages, we introduce UC 2 , the first machine translation-augmented framework for cross-lingual cross-modal representation learning. To tackle the scarcity problem of multilingual captions for image datasets, we first augment existing English-only datasets with other languages via machine translation (MT). Then we extend the standard M… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 43 publications
(91 reference statements)
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?