Deep learning model for automatic image captioning

Chandhar, Kothakonda; Sandeep, C. H.; Akarapu, Mahesh; Chythanya, Kanegonda Ravi; Thirupathi, V.

doi:10.1063/5.0081847

Cited by 2 publications

(1 citation statement)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The definition of an encoder is an activity to extract features in an image or visual content and then produce a data set that represents every object in the picture. The notion of a decoder is a sentence reconstruction process based on a collection of previously identified objects and produces a text or sentence description of the image [7][8] [9].…”

Section: Introductionmentioning

confidence: 99%

Image Captioning using Hybrid of VGG16 and Bidirectional LSTM Model

Azhar

Anugerah²,

Fahlopy³

et al. 2022

KINETIK

View full text Add to dashboard Cite

Image captioning is one of the biggest challenges in the fields of computer vision and natural language processing. Many other studies have raised the topic of image captioning. However, the evaluation results from other studies are still low. Thus, this study focuses on improving the evaluation results from previous studies. In this study, we used the Flickr8k dataset and the VGG16 Convolutional Neural Networks (CNN) model as an encoder to generate feature extraction from images. Recurrent Neural Network (RNN) uses the Bidirectional Long-Short Term Memory (BiLSTM) method as a decoder. The results of the image feature extraction process in the form of feature vectors are then forwarded to Bidirectional LSTM to produce descriptions that match the input image or visual content. The captions provide information on the object’s name, location, color, size, features of an object, and surroundings. A greedy Search algorithm with Argmax function and Beam-Search algorithm are used to calculate Bilingual Evaluation Understudy (BLEU) scores. The results of the evaluation of the best BLEU scores obtained from this study are the VGG16 model with Bidirectional LSTM using Beam Search with parameter K = 3 and the BLEU-1 score is 0.60593, so this score is superior to previous studies.

show abstract

Section: Introductionmentioning

confidence: 99%