2021
DOI: 10.1609/aaai.v35i2.16184
|View full text |Cite
|
Sign up to set email alerts
|

Commonsense Knowledge Aware Concept Selection For Diverse and Informative Visual Storytelling

Abstract: Visual storytelling is a task of generating relevant and interesting stories for given image sequences. In this work we aim at increasing the diversity of the generated stories while preserving the informative content from the images. We propose to foster the diversity and informativeness of a generated story by using a concept selection module that suggests a set of concept candidates. Then, we utilize a large scale pre-trained model to convert concepts and images into full stories. To enrich the candidate co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 27 publications
(12 citation statements)
references
References 34 publications
0
12
0
Order By: Relevance
“…Harnessing commonsense knowledge for VIST was first attempted in [119], followed by [120,121], which however utilize RNN-bases structures for text generation. The usage of transformer-based models is explored in [122], where visual concepts are enriched through ConceptNet. All relevant enriched concepts are provided to BART, which ultimately outputs appropriate captions.…”
Section: Sequential Generationmentioning
confidence: 99%
“…Harnessing commonsense knowledge for VIST was first attempted in [119], followed by [120,121], which however utilize RNN-bases structures for text generation. The usage of transformer-based models is explored in [122], where visual concepts are enriched through ConceptNet. All relevant enriched concepts are provided to BART, which ultimately outputs appropriate captions.…”
Section: Sequential Generationmentioning
confidence: 99%
“…In the end-to-end pipeline, models are developed to autoregressively generate multi-sentence stories given the image stream in a unified structure (Wang et al, 2018;Kim et al, 2018). Meanwhile, multistage approaches that introduce more planning or external knowledge have also shown impressive performance (Yao et al, 2018;Hsu et al, 2020;Chen et al, 2021). Further, some other works are devoted to adopting more elaborate learning paradigms to improve the informativeness and controllability of story generation (Yang et al, 2019;Hu et al, 2019;Jung et al, 2020).…”
Section: Visual Storytellingmentioning
confidence: 99%
“…More elaborate frameworks with multi-stage generation pipelines have also been proposed that guide storytelling via e.g. storylineplanning (Yao et al, 2018), external knowledge engagement (Yang et al, 2019;Hsu et al, 2019Hsu et al, , 2020, and concept selection (Chen et al, 2021). Effective as these existing methods are for describing real-world photo streams (e.g.…”
Section: Introductionmentioning
confidence: 99%
“…In the language-and-vision community, Huang et al (2016) operationalized the task and released the Visual Storytelling Dataset (VIST), a collection of English stories created by speakers on top of 5-image visual sequences. Several models have been proposed for the task of generating plausible stories for a given sequence, ranging from RNNs (Kim et al, 2018) to Transformers, trained either end-to-end or leveraging additional knowledge-graphs (Chen et al, 2021).…”
Section: Introductionmentioning
confidence: 99%