Image Caption Generation Using Deep Learning Technique

Amritkar, Chetan; Jabade, Vaishali

doi:10.1109/iccubea.2018.8697360

Cited by 56 publications

(17 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this paper, Mainly three categories of features i.e geometric, conceptual, and visual are used for content generation.Also, variety of methods have been proposed for image caption generation in the past. They may be classified in broadly three categories i.e., Template-based methods [14,30,51,37], Retrieval-based methods [40,43,16,46,20], and Deep neural network based (Encoder-decoder) methods [49,3,15,12]. These models are often built using CNN to encode the image & extract visual information whereas RNN is used to decode the visual information into a sentence.…”

Section: Related Literature Reviewmentioning

confidence: 99%

“…Identification of proper CNN and RNN models is a challenging issue. The summary of our research contributions is as follows: In regard to contribution 2, there are several researches in literature that have used only some specific metrics out of all the available choices [34,49,33,24,3,15,10,25]. This could potentially lead to unfair evaluation of the results.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Automatic Image Caption Generation Using Deep Learning

Verma

Yadav

Kumar

et al. 2022

Preprint

View full text Add to dashboard Cite

Image captioning is an interesting and challenging task with applications in diverse domains such as image retrieval, organizing and locating images of users’ interest etc. It has huge potential for replacing manual caption generation for images and is especially suitable for large scale image data. Recently, deep neural network based methods have achieved great success in the field of computer vision, machine translation and language generation. In this paper, we propose an encoder-decoder based model that is capable of generating grammatically correct captions for images. This model makes use of VGG16 Hybrid Places 1365 as encoder and LSTM as decoder. To ensure the complete ground truth accuracy, the model is trained on the labelled Flickr8k and MSCOCO Captions datasets. Further, the model is evaluated using all standard metrics such as BLEU, METEOR, GLEU and ROUGE L. Experimental results indicate that the proposed model obtained a BLEU-1 score 0.6666, METEOR score 0.5060 and GLEU score 0.2469 on Flickr8k dataset and BLEU-1 score 0.7350, METEOR score 0.4768 and GLEU score 0.2798 on the MSCOCO Captions dataset. Thus, the proposed method achieved a significant performance as compared to the state-of-art approaches. To evaluate the efficacy of the model further, we also show the results of a caption generation from live sample images that reinforces the validity of the proposed approach.

show abstract

Section: Related Literature Reviewmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Automatic Image Caption Generation Using Deep Learning

Verma

Yadav

Kumar

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Chetan Amritkar et al [6] presented the method where the image caption i.e. The content of the image can be generated using CNN and RNN.…”

Section: Literature Surveymentioning

confidence: 99%

An Advanced Image Captioning using combination of CNN and LSTM

Raut¹

2021

TURCOMAT

View full text Add to dashboard Cite

The Captioning of Image now a days is gaining a lot of interest which generates an automated simple and short sentence describing the image content. Machines indeed are trained in a way that they can understand the Image content and generate captions which are almost accurate at a human level of knowledge is a very tedious and interesting task. There are various solutions used to solve this tedious task and generate simple sentences known as captions using neural network which still comes with problems such as inaccurate captions, generating captions only for the seen images, etc. In this paper, the proposed system model was able to generate more precise captions using a two staged model which consists of a combination of Deep Neural Network algorithms (Convolutional and Long Short-Term Memory). The proposed model was able to overcome the problems arise using Traditional CNN and RNN algorithms. The model is trained and tested using the Flicker8k Data set.

show abstract

“…The actual structure of LSTM is understood. The recent improvements in training deep neural networks have tremendously encouraged the huge development in captioning of images [15] and created a bound by joining the area of machine creativity and artificial intelligence.…”

Section: Outcomes From Papermentioning

confidence: 99%

Image Caption Generation Using Neural Network Models and LSTM Hierarchical Structure

Waghmare¹,

Shinde²

2021

Computational Intelligence in Pattern Recognition

View full text Add to dashboard Cite

The caption generation is nothing but the generation of textual information from images. For this, objects from images are extracted and classified among predefined classes. The logical objects from the image are extracted and transformed into natural sentences. The recognizing process requires an iterative task that incorporates image recognition as well as machine vision. The process must define relations among objects, persons, and animals and create the textual description of these relations. The paper is about the study of deep learning techniques to discover, identify and produce good captions for a source image. The process of making explanations in the form of sentences for a source image is image captioning, which involves machine vision and natural sentence forming techniques. For these processes, recent models have used deep learning techniques to acquire a great improvement in performance. Second, a more advanced trend is set in utilizing attention-dependent structure for captioning. Recent interpreters use a process of attention for each produced term containing seen term and unseen term. However, these unseen terms are effortlessly detected by considering a model for language in the absence of taking seen indicators, but unseen words could cause and a give bad performance for visual captioning. Taking these issues into consideration, the hierarchy of LSTM [Long-Short-Term Memory] with adaptive attention approach for the creation of captions for images and videos is presented.

show abstract

Image Caption Generation Using Deep Learning Technique

Cited by 56 publications

References 11 publications

Automatic Image Caption Generation Using Deep Learning

Automatic Image Caption Generation Using Deep Learning

An Advanced Image Captioning using combination of CNN and LSTM

Image Caption Generation Using Neural Network Models and LSTM Hierarchical Structure

Contact Info

Product

Resources

About