2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00433
|View full text |Cite
|
Sign up to set email alerts
|

MSCap: Multi-Style Image Captioning With Unpaired Stylized Text

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
85
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 112 publications
(85 citation statements)
references
References 20 publications
0
85
0
Order By: Relevance
“…Some recent image captioning studies [26,21,31] have constructed variant LSTM language models to learn factual and non-factual knowledge in corpora. Some studies [32,21,33,34] have allowed for learning non-factual knowledge in unpaired corpora via weakly supervised or unsupervised methods. These methods are expected (but not limited) to be used to train better models for autogenerating factual and functional captions of drug paraphernalia.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Some recent image captioning studies [26,21,31] have constructed variant LSTM language models to learn factual and non-factual knowledge in corpora. Some studies [32,21,33,34] have allowed for learning non-factual knowledge in unpaired corpora via weakly supervised or unsupervised methods. These methods are expected (but not limited) to be used to train better models for autogenerating factual and functional captions of drug paraphernalia.…”
Section: Discussionmentioning
confidence: 99%
“…Chen et al [32] proposed generating non-factual image captions with the Domain Layer Norm, which enables the generation of various stylized sentences. MSCap [33] was designed to generate multiple stylized descriptions by training a single captioning model on an unpaired non-factual corpus with the help of several auxiliary modules. Zhao et al [34] proposed a new model named MemCap, which resorts to explicitly encoding non-factual knowledge by building a memory module.…”
Section: Related Work a Image Captioning Datasetsmentioning
confidence: 99%
“…Thus, methods developed on such datasets might not be easily adopted in the wild. Nevertheless, great efforts have been made to extend captioning to out-of-domain data [3,9,69] or different styles beyond mere factual descriptions [22,55]. In this work we explore unsupervised captioning, where image and language sources are independent.…”
Section: Language Domainmentioning
confidence: 99%
“…In [69], the cross-domain problem is addressed with a cycle objective. Similarly, unpaired data can be used to generate stylized descriptions [22,46]. Anderson et al [3] propose a method to complete partial sequence data, e.g.…”
Section: Related Workmentioning
confidence: 99%
“…Based on the encoderdecoder pipeline [24,30,33], much progress has been made on image captioning. For example, [27] introduces the visual attention that adaptively attends to the salient areas in the image, [18] proposes an adaptive attention model that decides whether to attend to the image or to the visual sentinel, [29] corporates learned language bias as a language prior for more human-like captions, [19] and [8] focus on the discriminability and style properties of image captions respectively, and [22] adopts reinforcement learning (RL) that directly optimize evaluation metric.…”
Section: Related Workmentioning
confidence: 99%