2020
DOI: 10.48550/arxiv.2009.03949
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Towards Unique and Informative Captioning of Images

Abstract: Despite considerable progress, state of the art image captioning models produce generic captions, leaving out important image details. Furthermore, these systems may even misrepresent the image in order to produce a simpler caption consisting of common concepts. In this paper, we first analyze both modern captioning systems and evaluation metrics through empirical experiments to quantify these phenomena. We find that modern captioning systems return higher likelihoods for incorrect distractor sentences compare… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 48 publications
0
1
0
Order By: Relevance
“…A direct outcome is that current evaluation solely considers those 5 sentences as relevant to a single image for the ITM task. However, it is a known fact that in MSCOCO or Flickr30k there are many sentences that can perfectly describe a non paired image [32,42,43]. In other words, there are sentences (images) that are relevant to images (sentences) even though they are not defined as such in the retrieval ground truth.…”
Section: Is An Image Worth 5 Sentences?mentioning
confidence: 99%
“…A direct outcome is that current evaluation solely considers those 5 sentences as relevant to a single image for the ITM task. However, it is a known fact that in MSCOCO or Flickr30k there are many sentences that can perfectly describe a non paired image [32,42,43]. In other words, there are sentences (images) that are relevant to images (sentences) even though they are not defined as such in the retrieval ground truth.…”
Section: Is An Image Worth 5 Sentences?mentioning
confidence: 99%