2021
DOI: 10.1016/j.csl.2020.101169
|View full text |Cite
|
Sign up to set email alerts
|

BERT-hLSTMs: BERT and hierarchical LSTMs for visual storytelling

Abstract: Visual storytelling is a creative and challenging task, aiming to automatically generate a storylike description for a sequence of images. The descriptions generated by previous visual storytelling approaches lack coherence because they use word-level sequence generation methods and do not adequately consider sentence-level dependencies. To tackle this problem, we propose a novel hierarchical visual storytelling framework which separately models sentence-level and word-level semantics. We use the transformer-b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2021
2021
2025
2025

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 24 publications
(6 citation statements)
references
References 45 publications
0
6
0
Order By: Relevance
“…However, in LSTM architecture, there are 4 layers in the loop as shown in Fig. 1 [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29]. Bidirectional Long Short-Term Memory (Bi-LSTM) is a variant of Recurrent Neural Network (RNN).…”
Section: Long Short-term Memory (Lstm)mentioning
confidence: 99%
“…However, in LSTM architecture, there are 4 layers in the loop as shown in Fig. 1 [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29]. Bidirectional Long Short-Term Memory (Bi-LSTM) is a variant of Recurrent Neural Network (RNN).…”
Section: Long Short-term Memory (Lstm)mentioning
confidence: 99%
“…This model is implemented based on bidirectional transformer encoder [20] , breaking 11 task records of natural language processing, and is now widely used in text classification, natural language understanding, machine translation, etc. [21][22][23] Literature [24] proposed the electric BERT pre-training model by training the electric power corpus. Through it, the transformer defect text was converted into word vector, and the entity recognition model was established based on BiLSTM-CRF, which greatly improved the accuracy of entity recognition of transformer defect text.…”
Section: Introductionmentioning
confidence: 99%
“…Early work [1] proposed a hierarchically‐attentive recurrent neural network, which included three RNNs, to generate stories. The researchers in [2] developed an end‐to‐end Bert‐hLSTMs model comprised of Bert and hierarchical LSTMs, in order to capture both sentence‐level and word‐level semantic information. Based on graph knowledge and relational reasoning, IRW [3] mimicked the logic of human story writing to create stories.…”
Section: Introductionmentioning
confidence: 99%