2020
DOI: 10.48550/arxiv.2004.03708
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Context-Aware Group Captioning via Self-Attention and Contrastive Features

Abstract: While image captioning has progressed rapidly, existing works focus mainly on describing single images. In this paper, we introduce a new task, context-aware group captioning, which aims to describe a group of target images in the context of another group of related reference images. Context-aware group captioning requires not only summarizing information from both the target and reference image group but also contrasting between them. To solve this problem, we propose a framework combining selfattention mecha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 60 publications
0
1
0
Order By: Relevance
“…In an effort to incorporate the relational reasoning ability into the model, a scene graph representation -a structured description that captures semantic summaries of entities and their relationshipshas been presented recently [7]. Since then, a number of works have proposed deep network-based approaches for generating the scene graphs, confirming its importance to the field [8,9,10,11,12,13,14]. While scene graph representation holds tremendous promise, extracting scene graphs from images is known to be challenging.…”
Section: Introductionmentioning
confidence: 99%
“…In an effort to incorporate the relational reasoning ability into the model, a scene graph representation -a structured description that captures semantic summaries of entities and their relationshipshas been presented recently [7]. Since then, a number of works have proposed deep network-based approaches for generating the scene graphs, confirming its importance to the field [8,9,10,11,12,13,14]. While scene graph representation holds tremendous promise, extracting scene graphs from images is known to be challenging.…”
Section: Introductionmentioning
confidence: 99%