Tuberculosis (TB) is an infectious disease that can lead towards death if left untreated. TB detection involves extraction of complex TB manifestation features such as lung cavity, air space consolidation, endobronchial spread, and pleural effusions from chest x-rays (CXRs). Deep learning based approach named convolutional neural network (CNN) has the ability to learn complex features from CXR images. The main problem is that CNN does not consider uncertainty to classify CXRs using softmax layer. It lacks in presenting the true probability of CXRs by differentiating confusing cases during TB detection. This paper presents the solution for TB identification by using Bayesian-based convolutional neural network (B-CNN). It deals with the uncertain cases that have low discernibility among the TB and non-TB manifested CXRs. The proposed TB identification methodology based on B-CNN is evaluated on two TB benchmark datasets, i.e., Montgomery and Shenzhen. For training and testing of proposed scheme we have utilized Google Colab platform which provides NVidia Tesla K80 with 12 GB of VRAM, single core of 2.3 GHz Xeon Processor, 12 GB RAM and 320 GB of disk. B-CNN achieves 96.42% and 86.46% accuracy on both dataset, respectively as compared to the state-of-the-art machine learning and CNN approaches. Moreover, B-CNN validates its results by filtering the CXRs as confusion cases where the variance of B-CNN predicted outputs is more than a certain threshold. Results prove the supremacy of B-CNN for the identification of TB and non-TB sample CXRs as compared to counterparts in terms of accuracy, variance in the predicted probabilities and model uncertainty. INDEX TERMS Tuberculosis identification, computer-aided diagnostics, medical image analysis, Bayesian convolutional neural networks, model uncertainty.
The acquisition of the spatial and angular information of a scene using light field (LF) technologies supplement a wide range of post-processing applications, such as scene reconstruction, refocusing, virtual view synthesis, and so forth. The additional angular information possessed by LF data increases the size of the overall data captured while offering the same spatial resolution. The main contributor to the size of captured data (i.e., angular information) contains a high correlation that is exploited by state-of-the-art video encoders by treating the LF as a pseudo video sequence (PVS). The interpretation of LF as a single PVS restricts the encoding scheme to only utilize a single-dimensional angular correlation present in the LF data. In this paper, we present an LF compression framework that efficiently exploits the spatial and angular correlation using a multiview extension of high-efficiency video coding (MV-HEVC). The input LF views are converted into multiple PVSs and are organized hierarchically. The rate-allocation scheme takes into account the assigned organization of frames and distributes quality/bits among them accordingly. Subsequently, the reference picture selection scheme prioritizes the reference frames based on the assigned quality. The proposed compression scheme is evaluated by following the common test conditions set by JPEG Pleno. The proposed scheme performs 0.75 dB better compared to state-of-the-art compression schemes and 2.5 dB better compared to the x265-based JPEG Pleno anchor scheme. Moreover, an optimized motionsearch scheme is proposed in the framework that reduces the computational complexity (in terms of the sum of absolute difference [SAD] computations) of motion estimation by up to 87% with a negligible loss in visual quality (approximately 0.05 dB).INDEX TERMS Compression, light field, MV-HEVC, plenoptic.
Surgical telementoring systems have gained lots of interest, especially in remote locations. However, bandwidth constraint has been the primary bottleneck for efficient telementoring systems. This study aims to establish an efficient surgical telementoring system, where the qualified surgeon (mentor) provides real-time guidance and technical assistance for surgical procedures to the on-spot physician (surgeon). High Efficiency Video Coding (HEVC/H.265)-based video compression has shown promising results for telementoring applications. However, there is a trade-off between the bandwidth resources required for video transmission and quality of video received by the remote surgeon. In order to efficiently compress and transmit real-time surgical videos, a hybrid lossless-lossy approach is proposed where surgical incision region is coded in high quality whereas the background region is coded in low quality based on distance from the surgical incision region. For surgical incision region extraction, state-of-the-art deep learning (DL) architectures for semantic segmentation can be used. However, the computational complexity of these architectures is high resulting in large training and inference times. For telementoring systems, encoding time is crucial; therefore, very deep architectures are not suitable for surgical incision extraction. In this study, we propose a shallow convolutional neural network (S-CNN)-based segmentation approach that consists of encoder network only for surgical region extraction. The segmentation performance of S-CNN is compared with one of the state-of-the-art image segmentation networks (SegNet), and results demonstrate the effectiveness of the proposed network. The proposed telementoring system is efficient and explicitly considers the physiological nature of the human visual system to encode the video by providing good overall visual impact in the location of surgery. The results of the proposed S-CNN-based segmentation demonstrated a pixel accuracy of 97% and a mean intersection over union accuracy of 79%. Similarly, HEVC experimental results showed that the proposed surgical region-based encoding scheme achieved an average bitrate reduction of 88.8% at high-quality settings in comparison with default full-frame HEVC encoding. The average gain in encoding performance (signal-to-noise) of the proposed algorithm is 11.5 dB in the surgical region. The bitrate saving and visual quality of the proposed optimal bit allocation scheme are compared with the mean shift segmentation-based coding scheme for fair comparison. The results show that the proposed scheme maintains high visual quality in surgical incision region along with achieving good bitrate saving. Based on comparison and results, the proposed encoding algorithm can be considered as an efficient and effective solution for surgical telementoring systems for low-bandwidth networks.
This study presents a shallow and robust road segmentation model. The computer-aided real-time applications, like driver assistance, require real-time and accurate processing. Current studies use Deep Convolutional Neural Networks (DCNN) for road segmentation. However, DCNN requires high computational power and lots of labeled data to learn abstract features for deeper layers. The deeper the layer is, the more abstract information it tends to learn. Moreover, the prediction time of the DCNN network is an important aspect of autonomous vehicles. To overcome these issues, a Multi-feature View-based Shallow Convolutional Neural Network (MVS-CNN) is proposed that utilizes the abstract features extracted from the explicitly derived representations of the input image. Gradient information of the input image is used as additional channels to enhance the learning process of the proposed deep learning architecture. The multi-feature views are fed to a fully-connected neural network to accurately segment the road regions. The testing accuracy demonstrates that the proposed MVS-CNN achieved an improvement of 2.7% as compared to baseline CNN consisting of only RGB inputs. Furthermore, the comparison of the proposed method with the popular semantic segmentation network (SegNet) has shown that the proposed scheme performs better while being more efficient during training and evaluation. Unlike traditional segmentation techniques, which are based on the encoder-decoder architecture, the proposed MVS-CNN consists of only the encoder network. The proposed MVS-CNN has been trained and validated with two well-known datasets: the KITTI Vision Benchmark and the Cityscapes dataset. The results have been compared with the state-ofthe-art deep learning architectures. The proposed MVS-CNN outperforms and shows supremacy in terms of model accuracy, processing time, and segmentation accuracy. Based on the experimental results, the proposed architecture can be considered as an efficient road segmentation architecture for autonomous vehicle systems.
Limited availability of medical imaging datasets is a vital limitation when using “data hungry” deep learning to gain performance improvements. Dealing with the issue, transfer learning has become a de facto standard, where a pre-trained convolution neural network (CNN), typically on natural images (e.g., ImageNet), is finetuned on medical images. Meanwhile, pre-trained transformers, which are self-attention-based models, have become de facto standard in natural language processing (NLP) and state of the art in image classification due to their powerful transfer learning abilities. Inspired by the success of transformers in NLP and image classification, large-scale transformers (such as vision transformer) are trained on natural images. Based on these recent developments, this research aims to explore the efficacy of pre-trained natural image transformers for medical images. Specifically, we analyze pre-trained vision transformer on CheXpert and pediatric pneumonia dataset. We use CNN standard models including VGGNet and ResNet as baseline models. By examining the acquired representations and results, we discover that transfer learning from the pre-trained vision transformer shows improved results as compared to pre-trained CNN which demonstrates a greater transfer ability of the transformers in medical imaging. Supplementary Information The online version contains supplementary material available at 10.1007/s10278-022-00666-z.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.