2019
DOI: 10.3390/app9102024
|View full text |Cite
|
Sign up to set email alerts
|

A Systematic Literature Review on Image Captioning

Abstract: Natural language problems have already been investigated for around five years. Recent progress in artificial intelligence (AI) has greatly improved the performance of models. However, the results are still not sufficiently satisfying. Machines cannot imitate human brains and the way they communicate, so it remains an ongoing task. Due to the increasing amount of information on this topic, it is very difficult to keep on track with the newest researches and results achieved in the image captioning field. In th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
45
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 52 publications
(46 citation statements)
references
References 81 publications
0
45
0
1
Order By: Relevance
“…Image captioning is a difficult task on the intersection of computer vision (CV) and natural language processing (NLP), which involves the generation of a short sentence describing the image [1].…”
Section: Image Captioningmentioning
confidence: 99%
See 1 more Smart Citation
“…Image captioning is a difficult task on the intersection of computer vision (CV) and natural language processing (NLP), which involves the generation of a short sentence describing the image [1].…”
Section: Image Captioningmentioning
confidence: 99%
“…Image captioning is the task of automatically generating a textual description of an image [1]. The goal pursued by the researchers is to make these textual descriptions as similar as possible to how a human would describe an image.…”
Section: Introductionmentioning
confidence: 99%
“…This task in WAT 2021 is formulated as generating a caption in Hindi and Malayalam for a specific region in the given image. Most existing research in the area of image captioning refers to generating a textual description for the entire image (Yang and Okazaki, 2020;Yang et al, 2017;Lindh et al, 2018;Staniūtė and Šešok, 2019;Miyazaki and Shimizu, 2016;Wu et al, 2017). However, a naive approach of using only a specified region (as defined by the rectangular bounding box) as an input to the generic image caption generation system often does not yield meaningful results.…”
Section: Image Caption Generationmentioning
confidence: 99%
“…This task in WAT 2021 is formulated as generating a caption in Hindi and Malayalam for a specific region in the given image. Most existing research in the area of image captioning refers to generating a textual description for the entire image (Yang and Okazaki, 2020;Lindh et al, 2018;Staniūtė and Šešok, 2019;Miyazaki and Shimizu, 2016;. However, a naive approach of using only a specified region (as defined by the rectangular bounding box) as an input to the generic image caption generation system often does not yield meaningful results.…”
Section: Image Caption Generationmentioning
confidence: 99%
“…Image Encoder: To textually describe an image or a region within, it first needs to be encoded into high-level complex features that capture its visual attributes. Several image captioning works (Yang and Okazaki, 2020;Lindh et al, 2018;Staniūtė and Šešok, 2019;Miyazaki and Shimizu, 2016; have demonstrated that the outputs of final or pre-final convolutional (conv) layers of deep CNNs are excellent features for the aforementioned objective. Along with features of the entire image, we propose to extract the features of the subregion as well using the same set of outputs of the conv layer.…”
Section: Image Caption Generationmentioning
confidence: 99%