SMU-Net: Style matching U-Net for brain tumor segmentation with missing modalities

Azad, Reza; Khosravi, Nika; Merhof, Dorit

doi:10.48550/arxiv.2204.02961

Cited by 5 publications

(12 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Knowledge Distillation: Knowledge distillation (KD; Hinton, Vinyals, and Dean 2015) was originally proposed to compress knowledge from one or more teacher networks (often large complex models or model ensemble) to a student one (often lightweight models). For multimodal segmentation with missing modalities, several works (Hu et al 2020;Wang et al 2021b;Chen et al 2021;Azad, Khosravi, and Merhof 2022) proposed to transfer the 'dark knowledge' of the full-modal network to missing-modal ones via co-training (Blum and Mitchell 1998). Although achieving decent performance, the co-training strategy incurred nonnegligible memory cost for training due to the dual-network architecture.…”

Section: Related Workmentioning

confidence: 99%

“…A naive approach is to train a 'dedicated' model for each possible subset of modalities. For better performance, the co-training strategy (Blum and Mitchell 1998) was often incorporated to distill knowledge from full-modal to missing-modal networks (Azad, Khosravi, and Merhof 2022;Chen et al 2021;Hu et al 2020;Wang et al 2021b). Despite their decent performance, the dedicated models were time-costly to train and space-costly to deploy, as 2 N − 1 models were needed for N modalities.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

M3AE: Multimodal Representation Learning for Brain Tumor Segmentation with Missing Modalities

Liu¹,

Wang²,

Lu³

et al. 2023

Preprint

View full text Add to dashboard Cite

Multimodal magnetic resonance imaging (MRI) provides complementary information for sub-region analysis of brain tumors. Plenty of methods have been proposed for automatic brain tumor segmentation using four common MRI modalities and achieved remarkable performance. In practice, however, it is common to have one or more modalities missing due to image corruption, artifacts, acquisition protocols, allergy to contrast agents, or simply cost. In this work, we propose a novel two-stage framework for brain tumor segmentation with missing modalities. In the first stage, a multimodal masked autoencoder (M 3 AE) is proposed, where both random modalities (i.e., modality dropout) and random patches of the remaining modalities are masked for a reconstruction task, for self-supervised learning of robust multimodal representations against missing modalities. To this end, we name our framework M 3 AE. Meanwhile, we employ model inversion to optimize a representative full-modal image at marginal extra cost, which will be used to substitute for the missing modalities and boost performance during inference. Then in the second stage, a memory-efficient self distillation is proposed to distill knowledge between heterogenous missing-modal situations while fine-tuning the model for supervised segmentation. Our M 3 AE belongs to the 'catchall' genre where a single model can be applied to all possible subsets of modalities, thus is economic for both training and deployment. Extensive experiments on BraTS 2018 and 2020 datasets demonstrate its superior performance to existing state-of-the-art methods with missing modalities, as well as the efficacy of its components. Our code is available at: https://github.com/ccarliu/m3ae.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

M3AE: Multimodal Representation Learning for Brain Tumor Segmentation with Missing Modalities

Liu¹,

Wang²,

Lu³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…In [46], an over-complete network is augmented with U-net, and in U-net++ [57], the encoder-decoder architecture is re-designed by adding dense skip connection between the modules. This structure has been further improved and utilized in different medical domains [10,30,23,6].…”

Section: Cnn-based Segmentation Networkmentioning

confidence: 99%

“…MCA stands for multi-head cross-attention and LN for LayerNorm. In addition, the impact of the DLF module is examined in Table (6), which demonstrates the proposed module's effectiveness in learning multi-scale feature representations and aids in enhancing the segmentation performance.…”

Section: Double-level Fusion Module (Dlf)mentioning

confidence: 99%

HiFormer: Hierarchical Multi-scale Representations Using Transformers for Medical Image Segmentation

Heidari¹,

Kazerouni²,

Soltany³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) have been the consensus for medical image segmentation tasks. However, they suffer from the limitation in modeling long-range dependencies and spatial correlations due to the nature of convolution operation. Although transformers were first developed to address this issue, they fail to capture low-level features. In contrast, it is demonstrated that both local and global features are crucial for dense prediction, such as segmenting in challenging contexts. In this paper, we propose HiFormer, a novel method that efficiently bridges a CNN and a transformer for medical image segmentation. Specifically, we design two multi-scale feature representations using the seminal Swin Transformer module and a CNN-based encoder. To secure a fine fusion of global and local features obtained from the two aforementioned representations, we propose a Double-Level Fusion (DLF) module in the skip connection of the encoder-decoder structure. Extensive experiments on various medical image segmentation datasets demonstrate the effectiveness of HiFormer over other CNNbased, transformer-based, and hybrid methods in terms of computational complexity, quantitative and qualitative results. Our code is publicly available at GitHub. Recently, motivated by the outstanding success of transformers in Natural Language Processing (NLP) [47], vision transformers have been developed to mitigate the drawbacks of CNNs in image recognition tasks [21]. Transformers primarily leverage a multi-head self-attention (MSA) mechanism that can effectively construct long-range dependencies between the sequence of tokens and capture global contexts. The vanilla vision transformer [21] ex-1

show abstract

“…Automatic and accurate medical image segmentation, which consists of automated delineation of anatomical structures and other regions of interest (ROIs), plays an integral role in the assessment of computer-aided diagnosis (CAD) [9,23,19,17,3,7]. As a flagship of deep learning, convolutional neural networks (CNNs) have scattered existing contributions in various medical image segmentation tasks for many years [31,28,5,4,6]. Among diverse CNN variants, the widely acknowledged symmetric Encoder-Decoder architecture nomenclature as U-Net [31] has demonstrated eminent segmentation potential.…”

Section: Introductionmentioning

confidence: 99%

TransDeepLab: Convolution-Free Transformer-Based DeepLab v3+ for Medical Image Segmentation

Azad

Heidari

Shariatnia

et al. 2022

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) have been the de facto standard in a diverse set of computer vision tasks for many years. Especially, deep neural networks based on seminal architectures such as U-shaped model with skip-connections or atrous convolution with pyramid pooling have been tailored to a wide range of medical image analysis tasks. The main advantage of such architectures is that they are prone to detaining versatile local features. However, as a general consensus, CNNs fail to capture long-range dependencies and spatial correlations due to the intrinsic property of confined receptive field size of convolution operations. Alternatively, Transformer, profiting from global information modeling that stems from the self-attention mechanism, has recently attained remarkable performance in natural language processing and computer vision. Nevertheless, previous studies prove that both local and global features are critical for a deep model in dense prediction, such as segmenting complicated structures with disparate shapes and configurations. To this end, this paper proposes TransDeepLab, a novel DeepLab-like pure Transformer for medical image segmentation. Specifically, we exploit hierarchical Swin-Transformer with shifted windows to extend the DeepLabv3 and model the Atrous Spatial Pyramid Pooling (ASPP) module. A thorough search of the relevant literature yielded that we are the first to model the seminal DeepLab model with a pure Transformer-based model. Extensive experiments on various medical image segmentation tasks verify that our approach performs superior or on par with most contemporary works on an amalgamation of Vision Transformer and CNN-based methods, along with a significant reduction of model complexity. The codes and trained models are publicly available at github..

show abstract

SMU-Net: Style matching U-Net for brain tumor segmentation with missing modalities

Cited by 5 publications

References 21 publications

M3AE: Multimodal Representation Learning for Brain Tumor Segmentation with Missing Modalities

M3AE: Multimodal Representation Learning for Brain Tumor Segmentation with Missing Modalities

HiFormer: Hierarchical Multi-scale Representations Using Transformers for Medical Image Segmentation

TransDeepLab: Convolution-Free Transformer-Based DeepLab v3+ for Medical Image Segmentation

Contact Info

Product

Resources

About