2018
DOI: 10.1007/s11280-018-0531-z
|View full text |Cite
|
Sign up to set email alerts
|

Residual attention-based LSTM for video captioning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
19
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 36 publications
(19 citation statements)
references
References 33 publications
0
19
0
Order By: Relevance
“…X.P. Li et al have developed a novel attention-based framework, called LSTM (Res-ATT [10]). This new model takes advantage of the existing mechanism of attention and integrates residual mapping into a two level LSTM network to avoid the loss of word information previously generated.…”
Section: Video Captioningmentioning
confidence: 99%
See 1 more Smart Citation
“…X.P. Li et al have developed a novel attention-based framework, called LSTM (Res-ATT [10]). This new model takes advantage of the existing mechanism of attention and integrates residual mapping into a two level LSTM network to avoid the loss of word information previously generated.…”
Section: Video Captioningmentioning
confidence: 99%
“…This new model takes advantage of the existing mechanism of attention and integrates residual mapping into a two level LSTM network to avoid the loss of word information previously generated. The decoder model based on a residual attention has five different parts: a sentence encoder, time attention, visual and sentence fusion layer, residual layer and MLP [10]. The phrase encoder is an LSTM layer exploring important syntactic information from the phrase.…”
Section: Video Captioningmentioning
confidence: 99%
“…Attention-based LSTM (Res-ATT) [6]. To describe the video in detail, the mechanism called temporal attention with CNN and LSTM has been applied by them.…”
Section: Et Al Have Developed An Architecture Called Residualmentioning
confidence: 99%
“…Typically, is used an LSTM-based encoder. This can be a single LSTM [31], a bidirectional LSTM (BiLSTM) [32], or a multilayer LSTM [33]. Using the GRU in the encoder is less common [14].…”
Section: Related Workmentioning
confidence: 99%
“…To improve the performance, attention mechanisms are employed at different points of the encoder-decoder system. In particular, at each word generation step, the decoder takes as input the video features weighted according to their relevance to the next word, based on the previously emitted words [31], [32], [38], [39]. With the same principle, in [40] the attention mechanism is applied to the mean-pooled features from a predefined number of objects tracklets in the video.…”
Section: Related Workmentioning
confidence: 99%