Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fullysupervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT.
Detecting pneumonia, especially COVID-19, from chest X-ray (CXR) images is one of the most effective ways for disease diagnosis and patient triage. The application of deep neural network for CXR image classification is limited due to the small sample size of the well-curated data. To tackle this problem, this paper proposes a distance transformation-based deep forest framework with the hybrid-feature fusion (DTDF-HFF) for accurate CXR image classification. In our proposed method, hybrid features of CXR images are extracted by two ways: handcrafted feature extraction and multi-grained scanning. Different types of features are fed into different classifiers in the same layer of the deep forest, and the prediction vector obtained at each layer is transformed to form distance vector based on a self-adaptive scheme. The distance vectors obtained by different classifiers are fused and concatenated with the original features, then input into the corresponding classifier at the next layer. The cascade grows until DTDF-HFF can no longer gain the benefits from the new layer. We compare the proposed method with other methods on the public CXR data sets, and the experimental results show that the proposed method can achieve state-of-the art performance. The code will be made publicly available at https://github.com/hongqq/DTDF-HFF.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.