2019
DOI: 10.48550/arxiv.1906.00717
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Masked Non-Autoregressive Image Captioning

Abstract: Existing captioning models often adopt the encoder-decoder architecture, where the decoder uses autoregressive decoding to generate captions, such that each token is generated sequentially given the preceding generated tokens. However, autoregressive decoding results in issues such as sequential error accumulation, slow generation, improper semantics and lack of diversity. Non-autoregressive decoding has been proposed to tackle slow generation for neural machine translation but suffers from multimodality probl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 13 publications
(21 citation statements)
references
References 15 publications
0
21
0
Order By: Relevance
“…The other is that beam search has been found to output low-quality when applied to large search spaces [34]. Non-autoregressive method is first proposed by [12,14] to address the above issues, allowing the image captioning model to generate all target words simultaneously. NAIC replaces 𝑀 <𝑑 with independent latent variable 𝑧 to remove the sequential dependencies and rewrite Equation 1 as:…”
Section: Non-autoregressive Image Captionmentioning
confidence: 99%
See 4 more Smart Citations
“…The other is that beam search has been found to output low-quality when applied to large search spaces [34]. Non-autoregressive method is first proposed by [12,14] to address the above issues, allowing the image captioning model to generate all target words simultaneously. NAIC replaces 𝑀 <𝑑 with independent latent variable 𝑧 to remove the sequential dependencies and rewrite Equation 1 as:…”
Section: Non-autoregressive Image Captionmentioning
confidence: 99%
“…The previous one-pass NAIC can be further boosted by introducing the multi-pass refinement mechanism [10,12,13]. Specifically, IR-NAIC applies a fusion function 𝑓 to deal with the sentence 𝑆 β€² produced in the preceding stage and comprehensively predict the new sentence 𝑆 by:…”
Section: Iterative Refinement Based Non-autoregressive Image Captionmentioning
confidence: 99%
See 3 more Smart Citations