2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022
DOI: 10.1109/wacv51458.2022.00181
|View full text |Cite
|
Sign up to set email alerts
|

UNETR: Transformers for 3D Medical Image Segmentation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
640
0
4

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 1,302 publications
(646 citation statements)
references
References 20 publications
2
640
0
4
Order By: Relevance
“…In TransUNet [1], convolutional layer was used as a feature extractor to obtain detailed information from raw images; it then generated feature maps which are put into Transformer layer to obtain global information. UNETR [49] proposed a 3D Transformercombining architecture for medical images, which treated Transformer layer as encoder to extract features and convolutional layer as decoder. A great amount of such work focused on taking advantage of both Transformer's long-range dependency and CNN's inductive bias.…”
Section: Transformers For Segmentation Tasksmentioning
confidence: 99%
See 2 more Smart Citations
“…In TransUNet [1], convolutional layer was used as a feature extractor to obtain detailed information from raw images; it then generated feature maps which are put into Transformer layer to obtain global information. UNETR [49] proposed a 3D Transformercombining architecture for medical images, which treated Transformer layer as encoder to extract features and convolutional layer as decoder. A great amount of such work focused on taking advantage of both Transformer's long-range dependency and CNN's inductive bias.…”
Section: Transformers For Segmentation Tasksmentioning
confidence: 99%
“…In Table 3, we compare the numbers of parameters and floating point operations (FLOPs) of our proposed D-Former with those of different 3D medical image segmentation models, including UNETR [49], CoTr [50], TransBTS [27], and nnFormer [42]. The number of FLOPs is calculated based on the input image size of 64×128×128 for fair comparison.…”
Section: Comparison Of Model Complexitymentioning
confidence: 99%
See 1 more Smart Citation
“…For 3D medical image segmentation, Xie et al [28] proposed a model comprising a CNN backbone to extract features, a Transformer to model long-range dependencies, and a CNN decoder to construct the segmentation map. More recently, Hatamizadeh et al [29] proposed UNETR, which utilizes ViT as the main encoder but directly connects it to the convolutional decoder via skip connections, as opposed to using a Transformer only in the bridge. Since self-attention is prohibitively expensive on long sequences, all these models apply Transformer on a low-resolution level after either patch embedding or a CNN backbone, making them fail to fully exploit the global context at the higher resolutions.…”
Section: Related Workmentioning
confidence: 99%
“…Image segmentation is an important part of medical image analysis. In particular, accurate and robust medical image segmentation can play a cornerstone role in computer-aided diagnosis and image-guided clinical surgery [1,2].…”
Section: Introductionmentioning
confidence: 99%