2022
DOI: 10.21203/rs.3.rs-1282936/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Automatic Image Caption Generation Using Deep Learning

Abstract: Image captioning is an interesting and challenging task with applications in diverse domains such as image retrieval, organizing and locating images of users’ interest etc. It has huge potential for replacing manual caption generation for images and is especially suitable for large scale image data. Recently, deep neural network based methods have achieved great success in the field of computer vision, machine translation and language generation. In this paper, we propose an encoder-decoder based model that is… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 39 publications
0
3
0
Order By: Relevance
“…Moreover, the proposed transformer model has demonstrated superior performance (Table 1) in terms of the BLEU-4 score (0.71) and METEOR score (0.81), indicating higher accuracy and fluency in caption generation compared with the traditional CNN model proposed by Akash Verma et al [21]. The authors of the study demonstrated that BLEU-4 score of the generated picture was (0.66) and a METEOR score of (0.50) using the "Flickr8k" dataset.…”
Section: Comparison Of Vit To Cnnmentioning
confidence: 95%
“…Moreover, the proposed transformer model has demonstrated superior performance (Table 1) in terms of the BLEU-4 score (0.71) and METEOR score (0.81), indicating higher accuracy and fluency in caption generation compared with the traditional CNN model proposed by Akash Verma et al [21]. The authors of the study demonstrated that BLEU-4 score of the generated picture was (0.66) and a METEOR score of (0.50) using the "Flickr8k" dataset.…”
Section: Comparison Of Vit To Cnnmentioning
confidence: 95%
“…Traditional methods relied upon search based and template based techniques which came with the drawback of major dependency on the datasets to generate captions [11] [12]. On the other hand, the deep learning methods turned the direction by introducing encoder-decoder framework [13], attention based model, reinforcement learning [25] and so on.…”
Section: Introductionmentioning
confidence: 99%
“…The different datasets such as Flickr8k, Flickr30k, MSCOCO and Pascal1K used for training the model[24]. Verma et al has described neural network-based model for automatic image captioning and has presented BLEU scores comparison of proposed model with other existing models for different images[25].…”
mentioning
confidence: 99%
See 1 more Smart Citation