2019
DOI: 10.1007/978-3-030-33894-7_30
|View full text |Cite
|
Sign up to set email alerts
|

A Hierarchical Approach for Visual Storytelling Using Image Description

Abstract: One of the primary challenges of visual storytelling is developing techniques that can maintain the context of the story over long event sequences to generate human-like stories. In this paper, we propose a hierarchical deep learning architecture based on encoder-decoder networks to address this problem. To better help our network maintain this context while also generating long and diverse sentences, we incorporate natural language image descriptions along with the images themselves to generate each story sen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
4
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 12 publications
(8 citation statements)
references
References 20 publications
0
4
0
1
Order By: Relevance
“…B-1 B-2 B-3 B-4 CIDEr ROUGE-L METEOR AREL 2018 [32] 0.536 0.315 0.173 0.099 0.038 0.286 0.352 GLACNet 2018 [14] 0.56 0.321 0.171 0.091 0.041 0.264 0.306 HCBNet 2019 [1] 0.59 0.348 0.191 0.105 0.051 0.274 0.34 HCBNet(w/o prev. sent.…”
Section: Modelunclassified
See 1 more Smart Citation
“…B-1 B-2 B-3 B-4 CIDEr ROUGE-L METEOR AREL 2018 [32] 0.536 0.315 0.173 0.099 0.038 0.286 0.352 GLACNet 2018 [14] 0.56 0.321 0.171 0.091 0.041 0.264 0.306 HCBNet 2019 [1] 0.59 0.348 0.191 0.105 0.051 0.274 0.34 HCBNet(w/o prev. sent.…”
Section: Modelunclassified
“…sent. attention) [1] 0.59 0.338 0.180 0.097 0.057 0.271 0.332 HCBNet(w/o description attention) [1] 0.58 0.345 0.194 0.108 0.043 0.271 0.337 HCBNet(VGG) 2019 [1] 0.59 0.34 0.186 0.104 0.051 0.269 0.334 ReCo-RL 2020 [ Story In Sequence (SIS) which is more relevant to storytelling problems and comprises a whole paragraph in precisely five sentences representing a story. In all dataset statements, it is essential to note that the names of the individuals are adjusted by "[male and female]", places by "[location]", and organizations by "[organization]".…”
Section: Modelmentioning
confidence: 99%
“…Automatically learning to map from image sequences to output stories is very challenging with no guidance, hence some approaches try to introduce some intermediate representation or data to help. A simple approach is taken by Nahian et al (2019), which encodes images and their associated text captions (from the VIST dataset) by separate encoders, and combines them, before decoding into the story sentences. Otherwise Nahian et al (2019) is a fairly straightforward encoder-decoder architecture.…”
Section: Exploiting Intermediate Data or Structuresmentioning
confidence: 99%
“…A simple approach is taken by Nahian et al (2019), which encodes images and their associated text captions (from the VIST dataset) by separate encoders, and combines them, before decoding into the story sentences. Otherwise Nahian et al (2019) is a fairly straightforward encoder-decoder architecture. Other works try to extract some semantic information from the images without simply using the caption given in the dataset.…”
Section: Exploiting Intermediate Data or Structuresmentioning
confidence: 99%
“…The colossal success in image recognition was possible with recent advances in artificial intelligence and deep learning [1][2][3][4]. The rudimentary operation involved in such applications is multiply-and-accumulate (MAC).…”
Section: Introductionmentioning
confidence: 99%