Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413886
|View full text |Cite
|
Sign up to set email alerts
|

Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling

Abstract: Visual Storytelling (VIST) is a task to tell a narrative story about a certain topic according to the given photo stream. The existing studies focus on designing complex models, which rely on a huge amount of human-annotated data. However, the annotation of VIST is extremely costly and many topics cannot be covered in the training dataset due to the long-tail topic distribution. In this paper, we focus on enhancing the generalization ability of the VIST model by considering the few-shot setting. Inspired by th… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 11 publications
(3 citation statements)
references
References 25 publications
0
3
0
Order By: Relevance
“…6.1, data naturally satisfy the long-tailed distribution in many fields, so solving the long-tailed problem may be able to improve the performance of models in these fields. However, according to the available research results, only a few work have studied from the perspective of the long-tailed distribution in their research field, such as long-tailed distribution of object classes in UAV images [179], long-tailed distribution of driving behavior in autonomous driving [116] , and topic in the field of visual story telling [91], content-related words for video captioning tasks [192], pose inclusion in datasets for 3D human pose estimation [181], dermatological categories in dermatological diagnosis [123], etc. We believe that for many research fields, the existing work to analyze and solve the long-tailed distribution problem is still not enough.…”
Section: Future Directionsmentioning
confidence: 99%
“…6.1, data naturally satisfy the long-tailed distribution in many fields, so solving the long-tailed problem may be able to improve the performance of models in these fields. However, according to the available research results, only a few work have studied from the perspective of the long-tailed distribution in their research field, such as long-tailed distribution of object classes in UAV images [179], long-tailed distribution of driving behavior in autonomous driving [116] , and topic in the field of visual story telling [91], content-related words for video captioning tasks [192], pose inclusion in datasets for 3D human pose estimation [181], dermatological categories in dermatological diagnosis [123], etc. We believe that for many research fields, the existing work to analyze and solve the long-tailed distribution problem is still not enough.…”
Section: Future Directionsmentioning
confidence: 99%
“…For example, Kim et al [27] introduced two levels of hierarchical RNNs with attention mechanisms in terms of global encoding level and local image level to address multiimage cued story generation. On the other hand, several recent works [1,18,19,33,34,37,62,67,70] have devoted to incorporating semantic knowledge to improve the quality of the generated story. For instance, Li et al [33] inferred semantic concepts and captured cross-modal rules for visual storytelling, and Hsu et al [19] distilled a wealthy of words from an external knowledge graph to generate more interesting stories.…”
Section: Related Workmentioning
confidence: 99%
“…With the popularity of social networks, a tremendous number of users routinely share a series of photos, along with their related comments/stories on social media platforms such as Instagram and Flickr. Consequently, a new task of visual storytelling [1,24,27,34,37,43,62,64,72], which aims at automatically generating a narrative story for an image stream (as shown in Figure 1), has recently attracted increasing attention in the multimedia community. Given a stream of images, humans are capable of composing a suitable story line and then generating a sequence of sentences.…”
Section: Introductionmentioning
confidence: 99%