2022
DOI: 10.48550/arxiv.2206.07986
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Image Captioning based on Feature Refinement and Reflective Decoding

Abstract: Automatically generating a description of an image in natural language is called image captioning. It is an active research topic that lies at the intersection of two major fields in artificial intelligence, computer vision, and natural language processing. Image captioning is one of the significant challenges in image understanding since it requires not only recognizing salient objects in the image but also their attributes and the way they interact. The system must then generate a syntactically and semantica… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 36 publications
0
3
0
Order By: Relevance
“…This architecture harnesses convolutional features derived from a CNN trained on ImageNet (specifically, the xception model) and combines them with object features extracted from the YOLOv4 model, which has been pre-trained on the MSCOCO dataset (Al-Malla et al, 2022). Alabduljabbar et al (2022) proposed a comprehensive image captioning system developed using an encoderdecoder framework equipped with an attention mechanism. mechanism in the realm of image captioning systems, adopting an end-to-end methodology entails employing both the encoder and decoder components in a closely integrated fashion this system applies two attention mechanisms, the first to the visual features to concentrate on the image salient region and the second to the textual features to generate captions with more detailed information.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…This architecture harnesses convolutional features derived from a CNN trained on ImageNet (specifically, the xception model) and combines them with object features extracted from the YOLOv4 model, which has been pre-trained on the MSCOCO dataset (Al-Malla et al, 2022). Alabduljabbar et al (2022) proposed a comprehensive image captioning system developed using an encoderdecoder framework equipped with an attention mechanism. mechanism in the realm of image captioning systems, adopting an end-to-end methodology entails employing both the encoder and decoder components in a closely integrated fashion this system applies two attention mechanisms, the first to the visual features to concentrate on the image salient region and the second to the textual features to generate captions with more detailed information.…”
Section: Related Workmentioning
confidence: 99%
“…Object detection: In the past few years, significant progress has been made in object detection. These advances are driven by the success of region proposal methods Hodosh et al (2013) and Region-based Convolutional Neural Networks (RCNN) (Alabduljabbar et al, 2022). In our model, we choose faster RCNN (Deng et al, 2009) as an object detection model due to its efficiency and effectiveness in object detection tasks.…”
Section: Encoding Partmentioning
confidence: 99%
See 1 more Smart Citation