2020
DOI: 10.48550/arxiv.2002.11848
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Analysis of diversity-accuracy tradeoff in image captioning

Abstract: We investigate the effect of different model architectures, training objectives, hyperparameter settings and decoding procedures on the diversity of automatically generated image captions. Our results show that 1) simple decoding by naive sampling, coupled with low temperature is a competitive and fast method to produce diverse and accurate caption sets; 2) training with CIDEr-based reward using Reinforcement learning harms the diversity properties of the resulting generator, which cannot be mitigated by manip… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 27 publications
1
5
0
Order By: Relevance
“…Using top-K sampling decreases the top-1 accuracy yet significantly increases all diversity scores (-3.6 CIDEr yet +22% in 1-gram with T=0.6, K=3). The same trend was also observed in [36].…”
Section: Ablation Studysupporting
confidence: 84%
See 1 more Smart Citation
“…Using top-K sampling decreases the top-1 accuracy yet significantly increases all diversity scores (-3.6 CIDEr yet +22% in 1-gram with T=0.6, K=3). The same trend was also observed in [36].…”
Section: Ablation Studysupporting
confidence: 84%
“…After NMS, top-ranked sub-graphs are decoded using an attention-based LSTM. As shown in [36], an optional top-K sampling [15,43] can be applied during the decoding to further improve caption diversity. We disable top-K sampling for our experiments unless otherwise noticed.…”
Section: Training and Inferencementioning
confidence: 99%
“…[28] is another exploration of Variational structure, which proposed a novel variational multi-modal inferring tree (similar to the syntax tree) to improve the lexical and syntactic diversity in captioning. At last, [4,1] are two works which also concerned about the relation between accuracy and diversity like the main purpose of this work. [1] proposed an off-policy strategy to increase the range of samples during RL training, which improves the diversity dramatically, however, also causes the same dramatic decrease of accuracy.…”
Section: Related Workmentioning
confidence: 99%
“…Diversity metrics For diversity evaluation, we adopt five benchmark diversity metrics in [27,28,4,1]. 1) n-gram diversity (Div-n): the ratio of distinct n-grams to the total number of words in the generated captions.…”
Section: Accuracy Metricsmentioning
confidence: 99%
See 1 more Smart Citation