The last 5 to 6 years have seen tremendous progress in automatic image captioning using deep learning. Initial research focused on the attribute-to-attribute comparison of image features and texts to describe the image as a sentence, the current research is handling issues related to semantics and correlations. However, current state of art research suffers from insufficient concepts when it comes to positional and geometrical attributes. The majority of research relying on CNN's (Convolutional Neural Networks) for object feature extractions has no clue about equivariance and rotational invariance which leads towards the orientation-less understanding of objects for captioning along with longer training time, and larger dataset. Furthermore, CNN's based image captioning encoders also fail to understand the geometrical alignment of object attributes within the image and hence mislabels distorted as correct. To cater to the above issues, we propose ICPS (Captioning with Positional and geometrical Semantics) a capsule network-based image captioning technique along with transformer neural networks as the decoder. The proposed ICPS architecture handles various geometrical properties of image objects with the help of parallelized capsules while the object-to-text decoding is done by Transformer Neural Networks. The inclusion of cluster capsules provides better object understanding in terms of position, equivariance, and geometrical orientation with more augmented object understanding over a small dataset in comparatively less time. The extracted image features provide a better understanding of image objects and help the decoding stage to narrate effectively with positional and geometrical details. We trained and tested our ICPS over the Flickr8k dataset and found ourselves to be better at captioning when it comes to describing the positional and geometrical transitions as compared to other current state-of-the-art research.
With the evolution of the human race, the associated diseases have also evolved. Pneumonia treated as the simple flu and allergy in the early stages of its inception is now threatening to humankind in various shapes like SARs and Covid. The advanced disease requires equal treatments and diagnosis. Our research tried to find and classify pneumonia inflammation within chest x-rays (CXR) with very limited datasets and has attempted to ensure a global perspective, i.e. one that addresses all possible inflammation regions within the CXR. In addition to having medical grade classification outputs in terms of accuracy and recall, we have also guaranteed to meet the medical requirements of classification justification with the help of modified class activation maps (mCAM). The training of a model having a global perspective understanding is carried out with the help of capsules network cluster (CsNC), which enables us to learn various geometrical, orientation, and positional views of the inflammation within the CXR. Our 16-capsules cluster helped understand different views easily within the same CXR without going through any image augmentation, as generally required by current detection models, thus reducing the overall training and evaluation time. We performed extensive experiments on the RSNA pneumonia dataset of CXR images using a set of evaluation metrics. We have been able to acquire up to 98.3% accuracy with a 99.5% recall during our final trials. We tested our final trained model over generic x-ray images acquired from clinics and found promising results over that.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.