2021
DOI: 10.3233/jifs-189415
|View full text |Cite
|
Sign up to set email alerts
|

Deep image captioning using an ensemble of CNN and LSTM based deep neural networks

Abstract: The paper is concerned with the problem of Image Caption Generation. The purpose of this paper is to create a deep learning model to generate captions for a given image by decoding the information available in the image. For this purpose, a custom ensemble model was used, which consisted of an Inception model and a 2-layer LSTM model, which were then concatenated and dense layers were added. The CNN part encodes the images and the LSTM part derives insights from the given captions. For comparative study, GRU a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 67 publications
(9 citation statements)
references
References 3 publications
0
9
0
Order By: Relevance
“…Deep learning has been established as a powerful tool in many scientific sectors due to its ability to extract high-level features for complex pattern recognition problems. Cultural heritage has also exploited the benefits of deep learning, especially for image and Natural Language Processing (NLP) related applications, such as for automatic image captioning that combines both computer vision and NLP [2]. Preservation and diagnostics of cultural heritage findings, e.g., paintings, sculptures, documents, and artworks, are crucial to determine the historical status of findings and extract the missing knowledge.…”
Section: Related Workmentioning
confidence: 99%
“…Deep learning has been established as a powerful tool in many scientific sectors due to its ability to extract high-level features for complex pattern recognition problems. Cultural heritage has also exploited the benefits of deep learning, especially for image and Natural Language Processing (NLP) related applications, such as for automatic image captioning that combines both computer vision and NLP [2]. Preservation and diagnostics of cultural heritage findings, e.g., paintings, sculptures, documents, and artworks, are crucial to determine the historical status of findings and extract the missing knowledge.…”
Section: Related Workmentioning
confidence: 99%
“…A significant limitation to this work is that L is pre-defined before training, and therefore the model only works on fixed-length CAPTCHA schemes. Combining CNN with LSTM to extract spatial and sequential features is successful in other similar areas like image captioning [ 1 ].…”
Section: Related Workmentioning
confidence: 99%
“…The packet delivery rate of the proposed method was 20% higher than the existing methods, and the packet loss rate of the system was reduced by 15%. Alzubi et al [25] used a deep neural network to integrate and study depth image subtitles. The researchers employed a user-defined integration model composed of an inception model and a two-layer long short-term memory (LSTM) model.…”
Section: Recent Related Workmentioning
confidence: 99%