2019
DOI: 10.1016/j.neucom.2019.06.085
|View full text |Cite
|
Sign up to set email alerts
|

DAA: Dual LSTMs with adaptive attention for image captioning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 35 publications
(9 citation statements)
references
References 9 publications
0
9
0
Order By: Relevance
“…The developed technique gives the better quality in the generation of image captions. Fen Xiao et al [32] developed an image captioning framework with dual LSTM for enhancing accessibility of blind people. In that, two separate LSTM frameworks were integrated with the adaptive semantic attention framework.…”
Section: Contributionsmentioning
confidence: 99%
“…The developed technique gives the better quality in the generation of image captions. Fen Xiao et al [32] developed an image captioning framework with dual LSTM for enhancing accessibility of blind people. In that, two separate LSTM frameworks were integrated with the adaptive semantic attention framework.…”
Section: Contributionsmentioning
confidence: 99%
“…However, RNN can only remember the short distance information in the information sequence. e special structure of LSTM network [33] makes the network have the ability to memorize long-distance information. RNN neurons store effective information in uncontrollable form in each time step, while the LSTM network uses the special learning mechanism to integrate and update the information of the last time point, effectively avoiding the phenomenon of gradient explosion and gradient loss.…”
Section: Long and Short Term Memory Networkmentioning
confidence: 99%
“…Crossmodal learning aims to learn the relationship between different modalities. Significant progress has been observed in visual, audio, and language modality learning, including cross-modal retrieval [29,30,31], cross-modal matching [32,33], image captioning [34,35,36], visual question answering [37,38,39], video summarization [40,41,42], etc. This paper focuses on the cross-modal learning between audio and visual modalities.…”
Section: Related Workmentioning
confidence: 99%