Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing 2023
DOI: 10.18653/v1/2023.emnlp-main.184
|View full text |Cite
|
Sign up to set email alerts
|

Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining

Emanuele Bugliarello,
Aida Nematzadeh,
Lisa Hendricks

Abstract: Recent work in vision-and-language pretraining has investigated supervised signals from object detection data to learn better, fine-grained multimodal representations. In this work, we take a step further and explore how we can tap into supervision from small-scale visual relation data. In particular, we propose two pretraining approaches to contextualise visual entities in a multimodal setup. With verbalised scene graphs, we transform visual relation triplets into structured captions, and treat them as additi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 42 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?