Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.157
|View full text |Cite
|
Sign up to set email alerts
|

Control Image Captioning Spatially and Temporally

Abstract: Generating image captions with user intention is an emerging need. The recently published Localized Narratives dataset takes mouse traces as another input to the image captioning task, which is an intuitive and efficient way for a user to control what to describe in the image. However, how to effectively employ traces to improve generation quality and controllability is still under exploration. This paper aims to solve this problem by proposing a novel model called LoopCAG, which connects Contrastive constrain… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 14 publications
(1 citation statement)
references
References 18 publications
0
1
0
Order By: Relevance
“…Integration for Alignment. Efforts to enhance fine-grained grounding and align with human intentions have incorporated diverse modalities, including bounding boxes [17,20,81,82], patches [33], coordinate tokens [48,66,78], and traces [54,73]. Despite advancements in multimodal integration, challenges persist in accurately interpreting context and intentions to provide pertinent, timely responses.…”
Section: Multimodalmentioning
confidence: 99%
“…Integration for Alignment. Efforts to enhance fine-grained grounding and align with human intentions have incorporated diverse modalities, including bounding boxes [17,20,81,82], patches [33], coordinate tokens [48,66,78], and traces [54,73]. Despite advancements in multimodal integration, challenges persist in accurately interpreting context and intentions to provide pertinent, timely responses.…”
Section: Multimodalmentioning
confidence: 99%