2021
DOI: 10.1109/access.2021.3131343
|View full text |Cite
|
Sign up to set email alerts
|

Image Captioning With Positional and Geometrical Semantics

Abstract: The last 5 to 6 years have seen tremendous progress in automatic image captioning using deep learning. Initial research focused on the attribute-to-attribute comparison of image features and texts to describe the image as a sentence, the current research is handling issues related to semantics and correlations. However, current state of art research suffers from insufficient concepts when it comes to positional and geometrical attributes. The majority of research relying on CNN's (Convolutional Neural Networks… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 35 publications
0
5
0
Order By: Relevance
“…Our approach leverages a pre-trained Capsule Networks Cluster [ 28 ], initially trained on the Flickr dataset for generating image captions with detailed positional and geometrical information. We have customized this pre-trained model to extract inflammation-specific features within the inflammation capsule layer.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Our approach leverages a pre-trained Capsule Networks Cluster [ 28 ], initially trained on the Flickr dataset for generating image captions with detailed positional and geometrical information. We have customized this pre-trained model to extract inflammation-specific features within the inflammation capsule layer.…”
Section: Methodsmentioning
confidence: 99%
“…In our work, we aim to aid in diagnosing inflammation due to pneumonia in chest X-rays by providing a lightweight geometrical and positional understanding-based deep learning network. Our proposed network is a composition of a capsule networks cluster [ 28 ] and a modified class activation map. The capsule networks cluster-based model is light because it requires very little data to be trained to understand the geometrical, orientational, and positional inflammation details in CXR.…”
Section: Introductionmentioning
confidence: 99%
“…Our methodology consumes a pre-trained Capsule Network Cluster [28]. The capsule network cluster was trained over the flicker dataset and has been used for generating captions against images with positional and geometrical information.…”
Section: Methodsmentioning
confidence: 99%
“…Haque et al presented an approach to mimic human using attention and object features [53]. A different approach with geometrical semantics to generate captions was introduced in [54].…”
Section: Related Workmentioning
confidence: 99%