2021
DOI: 10.3390/s21041270
|View full text |Cite
|
Sign up to set email alerts
|

Image Captioning Using Motion-CNN with Object Detection

Abstract: Automatic image captioning has many important applications, such as the depiction of visual contents for visually impaired people or the indexing of images on the internet. Recently, deep learning-based image captioning models have been researched extensively. For caption generation, they learn the relation between image features and words included in the captions. However, image features might not be relevant for certain words such as verbs. Therefore, our earlier reported method included the use of motion fe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(11 citation statements)
references
References 21 publications
0
11
0
Order By: Relevance
“…Image captioning is the task of describing the content of an image in words. 32 In this study, we applied NLP-generated image captioning to assist residents to draft diagnostic reports and improve their report efficiency. The mechanism underlying the improved performance of AI-assisted reporting is complex.…”
Section: Discussionmentioning
confidence: 99%
“…Image captioning is the task of describing the content of an image in words. 32 In this study, we applied NLP-generated image captioning to assist residents to draft diagnostic reports and improve their report efficiency. The mechanism underlying the improved performance of AI-assisted reporting is complex.…”
Section: Discussionmentioning
confidence: 99%
“…Iwamura et al [10] presented a trainable end-to-end approach for generating the image caption with three datasets namely several copyright-free images, MSCOCO and MSR-VTT2016-image. In this framework, the four phases were performed such as feature extraction, motion estimation, object detection and caption generation.…”
Section: Contributionsmentioning
confidence: 99%
“…Inappropriately, the present Image captioning methods were built utilizing crowdsourced, large, publicly available datasets that were accumulated and created in a contrived setting [3]. Therefore, such methods execute poorly on images snapped by VIPs largely due to the images snapped by visually impaired persons varying dramatically from the images presented in the data [4].…”
Section: Introductionmentioning
confidence: 99%