2020
DOI: 10.1609/aaai.v34i07.6826
|View full text |Cite
|
Sign up to set email alerts
|

Interactive Dual Generative Adversarial Networks for Image Captioning

Abstract: Image captioning is usually built on either generation-based or retrieval-based approaches. Both ways have certain strengths but suffer from their own limitations. In this paper, we propose an Interactive Dual Generative Adversarial Network (IDGAN) for image captioning, which mutually combines the retrieval-based and generation-based methods to learn a better image captioning ensemble. IDGAN consists of two generators and two discriminators, where the generation- and retrieval-based generators mutually benefit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 24 publications
(8 citation statements)
references
References 22 publications
0
8
0
Order By: Relevance
“…There are also other state-of-the-art deep image captioning models that are trained in semi-supervised fashion without employing attention mechanisms. For instance, the dual generative adversarial network for image captioning by Liu et al [69] does not employ attention mechanisms and instead relies on generative adversarial networks [32].…”
Section: Comparison Of Attentive Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…There are also other state-of-the-art deep image captioning models that are trained in semi-supervised fashion without employing attention mechanisms. For instance, the dual generative adversarial network for image captioning by Liu et al [69] does not employ attention mechanisms and instead relies on generative adversarial networks [32].…”
Section: Comparison Of Attentive Methodsmentioning
confidence: 99%
“…We follow by the introduction of bottom-up and topdown attention [3] (Up-Down Attention), which became a source of inspiration for most of the later work. In recent years, the use of generative adversarial networks (GANs) for image captioning has also led to good results [19,9,69]. In comparison with the encoder-decoder architecture, which is usually trained with cross-entropy loss, GAN architectures are trained with adversarial loss, making it impossible to perform a direct comparison of performance.…”
Section: Attentive Deep Learning For Image Captioningmentioning
confidence: 99%
“…In recent studies, transformer models have also proven to be effective in several recent studies. Some methods use the transformer to effectively integrate visual and textual information [ 44 , 45 , 46 , 47 , 48 ]. In addition, studies have explored the use of a cross-modal transformer in image captioning [ 49 , 50 ], which integrated visual and textual information flexibly and effectively.…”
Section: Related Workmentioning
confidence: 99%
“…Difference Detection between Images. Whereas learn-ing the pixel-level difference and/or flows of consecutive images have been extensively studied (Stent et al 2016;Khan et al 2017) (Liu et al 2018) incorporate image retrieval to generate more diverse and richer sentences and (Liu et al 2020) add a text retrieval module to improve the quality of generated captions. (Qiao et al 2019;Joseph et al 2019) jointly train an image generation model with a captioning model for better text-to-image generation.…”
Section: Related Workmentioning
confidence: 99%