2020
DOI: 10.1609/aaai.v34i07.6998
|View full text |Cite
|
Sign up to set email alerts
|

MemCap: Memorizing Style Knowledge for Image Captioning

Abstract: Generating stylized captions for images is a challenging task since it requires not only describing the content of the image accurately but also expressing the desired linguistic style appropriately. In this paper, we propose MemCap, a novel stylized image captioning method that explicitly encodes the knowledge about linguistic styles with memory mechanism. Rather than relying heavily on a language model to capture style factors in existing methods, our method resorts to memorizing stylized elements learned fr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
38
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 61 publications
(38 citation statements)
references
References 18 publications
0
38
0
Order By: Relevance
“…Some recent image captioning studies [26,21,31] have constructed variant LSTM language models to learn factual and non-factual knowledge in corpora. Some studies [32,21,33,34] have allowed for learning non-factual knowledge in unpaired corpora via weakly supervised or unsupervised methods. These methods are expected (but not limited) to be used to train better models for autogenerating factual and functional captions of drug paraphernalia.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Some recent image captioning studies [26,21,31] have constructed variant LSTM language models to learn factual and non-factual knowledge in corpora. Some studies [32,21,33,34] have allowed for learning non-factual knowledge in unpaired corpora via weakly supervised or unsupervised methods. These methods are expected (but not limited) to be used to train better models for autogenerating factual and functional captions of drug paraphernalia.…”
Section: Discussionmentioning
confidence: 99%
“…MSCap [33] was designed to generate multiple stylized descriptions by training a single captioning model on an unpaired non-factual corpus with the help of several auxiliary modules. Zhao et al [34] proposed a new model named MemCap, which resorts to explicitly encoding non-factual knowledge by building a memory module.…”
Section: Related Work a Image Captioning Datasetsmentioning
confidence: 99%
“…Basic encoder-decoder architecture is composed of CNN (as encoder) and RNN (Recurrent Neural Networks) (as a decoder). The image is fed to CNN for feature conversion while features are fed to RNN for mapping against the annotation words [12,13,14,15]. To make the network more innovative and efficient various additions are done in the model for example incorporation of visual attention mechanisms [16,17], region of interests, and attention behaviors [18,19].…”
Section: Related Workmentioning
confidence: 99%
“…Glove is for representing words. Zhao et al [157] have proposed 'MemCap', a novel stylized image captioning method that explicitly encodes the knowledge about linguistic styles with memory mechanism. The authors have proposed to implement VGG-16 with Faster-RCNN for visual feature extraction.…”
Section: Feature Extractionmentioning
confidence: 99%
“…Jia and Li [48] have proposed LSTM as a sentence generator. Zhao et al [157] have proposed to generate captions such as the SN Computer Science proposed model 'MemCap' first extracts content-relevant style knowledge from the memory module via an attention mechanism and then incorporates the extracted knowledge into a language model. The decoder of the proposed model of Chen and Jin [17] is an RNN model of LSTM cell.…”
Section: Sentence Generationmentioning
confidence: 99%