2022
DOI: 10.48550/arxiv.2201.01266
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images

Abstract: Semantic segmentation of brain tumors is a fundamental medical image analysis task involving multiple MRI imaging modalities that can assist clinicians in diagnosing the patient and successively studying the progression of the malignant entity. In recent years, Fully Convolutional Neural Networks (FCNNs) approaches have become the de facto standard for 3D medical image segmentation. The popular "U-shaped" network architecture has achieved state-of-the-art performance benchmarks on different 2D and 3D semantic … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
67
0
2

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 38 publications
(69 citation statements)
references
References 29 publications
0
67
0
2
Order By: Relevance
“…Experiments on BraTs 2021 [140] show that VT-UNet is robust to data artifacts and exhibits strong generalization ability. In another similar work, Hatamizadeh et al [145] propose Swin UNet based architecture, Swin UNETR, that consists of Swin transformer as the encoder and a CNN-based decoder. Specifically, Swin UNETR computes self-attention in an efficient shifted window partitioning scheme and is a top-performing model on BraTs 2021 [140] validation set.…”
Section: D Medical Segmentationmentioning
confidence: 99%
See 1 more Smart Citation
“…Experiments on BraTs 2021 [140] show that VT-UNet is robust to data artifacts and exhibits strong generalization ability. In another similar work, Hatamizadeh et al [145] propose Swin UNet based architecture, Swin UNETR, that consists of Swin transformer as the encoder and a CNN-based decoder. Specifically, Swin UNETR computes self-attention in an efficient shifted window partitioning scheme and is a top-performing model on BraTs 2021 [140] validation set.…”
Section: D Medical Segmentationmentioning
confidence: 99%
“…Swin UNETR [145] Brain MRI 3D BraTS 21 [140] Dice, Hausdorff distance Hybrid Yes Swin UNet based architecture that consists of Swin transformer as the encoder and a CNNbased decoder. Computes self-attention in an efficient shifted window partitioning scheme.…”
Section: Hybridmentioning
confidence: 99%
“…Following this work, Zhang et al developed an architecture for multi-organ segmentation that runs a CNN-based encoder and Transformer-based segmentation network in parallel and fuses the features from these two branches to jointly make predictions. Hatamizadeh et al [2022] to the tasks of head and neck tumour segmentation using multi-modal data (i.e., CT and PET images) and compared their results with traditional CNN-based approaches. Yan et al Yan et al [2022] also employed a U-Net-based structure for the task of multi-organ segmentation in 3D medical image data, however, with slightly different changes: in their approach, they used a CNN-encoder and CNN-decoder with a Transformer model in between to fuse contextual information in the neighboring image slices.…”
Section: Multi-task Segmentationmentioning
confidence: 99%
“…added an attention mechanism to a backbone network based on the ResNetHe et al [2016] to aid the diagnosis of Alzheimer's disease in brain MRI data.Using well-known data sets related to different use cases and working on top ofHu et al [2018a], Roy et alRoy et al [2018] verified that the addition of a spatial-channel squeeze and excitation block works as an attention mechanism in FCNs and improves the quality of the segmentation maps. Interestingly, in most recent papers, the authors approach the problem of medical image segmentation using encoder-decoder structures that generally comprise a CNN and a Transformer to extract volumetric spatial feature maps, perform global feature modeling and predict refined segmentation maps,Jia and Shu [2021],Peiris et al [2021],Hatamizadeh et al [2022], although there are already some works that replace the CNN-based modules at the encoder or decoder levels and integrate other attention mechanisms to extract features and model long-range dependencies. More recent methodologies…”
mentioning
confidence: 99%
“…More recently, hierarchical transformers are proposed with shifted-window [18], it enables cross-patch self-attention connections. Based on Swin ViT, Swin UNETR [10,24] and SwinUNET [2] are introduced for capturing multiscale features in CT images. However, the modification on local self-attention results in quadratic increase of complexity.…”
Section: Introductionmentioning
confidence: 99%