A Systematic Literature Review on Image Captioning

Staniūtė, Raimonda; Šešok, Dmitrij

doi:10.3390/app9102024

Cited by 52 publications

(46 citation statements)

References 81 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Image captioning is a difficult task on the intersection of computer vision (CV) and natural language processing (NLP), which involves the generation of a short sentence describing the image [1].…”

Section: Image Captioningmentioning

confidence: 99%

See 1 more Smart Citation

Text Augmentation Using BERT for Image Captioning

Atliha

Šešok

2020

Applied Sciences

Self Cite

View full text Add to dashboard Cite

Image captioning is an important task for improving human-computer interaction as well as for a deeper understanding of the mechanisms underlying the image description by human. In recent years, this research field has rapidly developed and a number of impressive results have been achieved. The typical models are based on a neural networks, including convolutional ones for encoding images and recurrent ones for decoding them into text. More than that, attention mechanism and transformers are actively used for boosting performance. However, even the best models have a limit in their quality with a lack of data. In order to generate a variety of descriptions of objects in different situations you need a large training set. The current commonly used datasets although rather large in terms of number of images are quite small in terms of the number of different captions per one image. We expanded the training dataset using text augmentation methods. Methods include augmentation with synonyms as a baseline and the state-of-the-art language model called Bidirectional Encoder Representations from Transformers (BERT). As a result, models that were trained on a datasets augmented show better results than that models trained on a dataset without augmentation.

show abstract

Section: Image Captioningmentioning

confidence: 99%

“…Image captioning is the task of automatically generating a textual description of an image [1]. The goal pursued by the researchers is to make these textual descriptions as similar as possible to how a human would describe an image.…”

Section: Introductionmentioning

confidence: 99%

Text Augmentation Using BERT for Image Captioning

Atliha

Šešok

2020

Applied Sciences

Self Cite

View full text Add to dashboard Cite

show abstract

“…This task in WAT 2021 is formulated as generating a caption in Hindi and Malayalam for a specific region in the given image. Most existing research in the area of image captioning refers to generating a textual description for the entire image (Yang and Okazaki, 2020;Yang et al, 2017;Lindh et al, 2018;Staniūtė and Šešok, 2019;Miyazaki and Shimizu, 2016;Wu et al, 2017). However, a naive approach of using only a specified region (as defined by the rectangular bounding box) as an input to the generic image caption generation system often does not yield meaningful results.…”

Section: Image Caption Generationmentioning

confidence: 99%

NLPHut’s Participation at WAT2021

Parida¹,

Panda²,

Kotwal³

et al. 2021

Proceedings of the 8th Workshop on Asian Translation (WAT2021)

View full text Add to dashboard Cite

This paper provides the description of shared tasks to the WAT 2021 by our team "NLPHut". We have participated in the English→Hindi Multimodal translation task, English→Malayalam Multimodal translation task, and Indic Multilingual translation task. We have used the state-of-the-art Transformer model with language tags in different settings for the translation task and proposed a novel "region-specific" caption generation approach using a combination of image CNN and LSTM for the Hindi and Malayalam image captioning. Our submission tops in English→Malayalam Multimodal translation task (text-only translation, and Malayalam caption), and ranks secondbest in English→Hindi Multimodal translation task (text-only translation, and Hindi caption). Our submissions have also performed well in the Indic Multilingual translation tasks. 2 https://ufal.mff.cuni. cz/malayalam-visual-genome/ wat2021-english-malayalam-multi 3 http://lotus.kuee.kyoto-u.ac.jp/WAT/ indic-multilingual/ 4 http://lotus.kuee.kyoto-u.ac.jp/WAT/ WAT2021/index.html

show abstract

“…This task in WAT 2021 is formulated as generating a caption in Hindi and Malayalam for a specific region in the given image. Most existing research in the area of image captioning refers to generating a textual description for the entire image (Yang and Okazaki, 2020;Lindh et al, 2018;Staniūtė and Šešok, 2019;Miyazaki and Shimizu, 2016;. However, a naive approach of using only a specified region (as defined by the rectangular bounding box) as an input to the generic image caption generation system often does not yield meaningful results.…”

Section: Image Caption Generationmentioning

confidence: 99%

“…Image Encoder: To textually describe an image or a region within, it first needs to be encoded into high-level complex features that capture its visual attributes. Several image captioning works (Yang and Okazaki, 2020;Lindh et al, 2018;Staniūtė and Šešok, 2019;Miyazaki and Shimizu, 2016; have demonstrated that the outputs of final or pre-final convolutional (conv) layers of deep CNNs are excellent features for the aforementioned objective. Along with features of the entire image, we propose to extract the features of the subregion as well using the same set of outputs of the conv layer.…”

Section: Image Caption Generationmentioning

confidence: 99%

Proceedings of the 8th Workshop on Asian Translation (WAT2021)

Nakazawa¹,

Goto²

2021

View full text Add to dashboard Cite

Many Asian countries are rapidly growing these days and the importance of communicating and exchanging the information with these countries has intensified. To satisfy the demand for communication among these countries, machine translation technology is essential.Machine translation technology has rapidly evolved recently and it is seeing practical use especially between European languages. However, the translation quality of Asian languages is not that high compared to that of European languages, and machine translation technology for these languages has not reached a stage of proliferation yet. This is not only due to the lack of the language resources for Asian languages but also due to the lack of techniques to correctly transfer the meaning of sentences from/to Asian languages. Consequently, a place for gathering and sharing the resources and knowledge about Asian language translation is necessary to enhance machine translation research for Asian languages.

show abstract

A Systematic Literature Review on Image Captioning

Cited by 52 publications

References 81 publications

Text Augmentation Using BERT for Image Captioning

Text Augmentation Using BERT for Image Captioning

NLPHut’s Participation at WAT2021

Proceedings of the 8th Workshop on Asian Translation (WAT2021)

Contact Info

Product

Resources

About