2020
DOI: 10.1007/978-3-030-58601-0_42
|View full text |Cite
|
Sign up to set email alerts
|

Length-Controllable Image Captioning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
54
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 52 publications
(55 citation statements)
references
References 35 publications
1
54
0
Order By: Relevance
“…The purpose of this experiment for various τ is to simulate the trace drawing speed of users in a real application scenario, and a larger τ is equivalent to a faster drawing speed. As Deng et al (2020) has demonstrated, the length is one of the critical facts that impact quantitative performance. This result implies we can further decide to generate either a coarse-grained or fine-grained caption by controlling the time-frequency τ .…”
Section: Quantitative Analysis Controllability Analysis On Temporal Ordermentioning
confidence: 99%
See 2 more Smart Citations
“…The purpose of this experiment for various τ is to simulate the trace drawing speed of users in a real application scenario, and a larger τ is equivalent to a faster drawing speed. As Deng et al (2020) has demonstrated, the length is one of the critical facts that impact quantitative performance. This result implies we can further decide to generate either a coarse-grained or fine-grained caption by controlling the time-frequency τ .…”
Section: Quantitative Analysis Controllability Analysis On Temporal Ordermentioning
confidence: 99%
“…Controllable Image Captioning is an emerging research direction. Previous works aim to control the captioning by Part-Of-Speech tagging (Deshpande et al, 2018), sentiment, (You et al, 2018), length (Deng et al, 2020), bounding box (Cornia et al, 2019) etc. Those works either tried to describe a semantic guided captioning.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…During training, we sample a masking ratio from a uniform prior distribution ([0,1]) and randomly mask the percentage of target tokens for prediction. While in inference, we adopt a non-autoregressive sampling strategy (i.e., mask-predict-k strategy [2,3,5]). So only a few sampling steps (e.g., 4) are needed to generate all target tokens, which enable real-time inference.…”
Section: Training and Inference Strategymentioning
confidence: 99%
“…However, significantly different from them, we borrow the idea from the mask-predict framework (Gu et al, 2018;Ghazvininejad et al, 2019;Deng et al, 2020) to progressively incorporate pairwise ordering information into SE-Graph, which is the basis of our graph-based sentence ordering model. To the best of our knowledge, our work is the first attempt to explore iteratively refined GNN for sentence ordering.…”
Section: Related Workmentioning
confidence: 99%