2021
DOI: 10.48550/arxiv.2102.04306
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

Abstract: Medical image segmentation is an essential prerequisite for developing healthcare systems, especially for disease diagnosis and treatment planning. On various medical image segmentation tasks, the ushaped architecture, also known as U-Net, has become the de-facto standard and achieved tremendous success. However, due to the intrinsic locality of convolution operations, U-Net generally demonstrates limitations in explicitly modeling long-range dependency. Transformers, designed for sequence-to-sequence predicti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

5
1,380
0
4

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 1,034 publications
(1,389 citation statements)
references
References 20 publications
5
1,380
0
4
Order By: Relevance
“…The most recently proposed TransUNet [49] for abdominal organ segmentation achieved a comparable overall average Dice value of 81.9%, but with significant performance drops comparing to the proposed network in ASSD (3.03 mm vs. 1.03 mm) and 95HD (13.25 mm vs. 5.40 mm). The Tran-sUNet employs 2D UNet as backbone and incorporates vision transformers (ViT) [52] into the encoder.…”
Section: A Comparison With Existing Modelsmentioning
confidence: 94%
See 3 more Smart Citations
“…The most recently proposed TransUNet [49] for abdominal organ segmentation achieved a comparable overall average Dice value of 81.9%, but with significant performance drops comparing to the proposed network in ASSD (3.03 mm vs. 1.03 mm) and 95HD (13.25 mm vs. 5.40 mm). The Tran-sUNet employs 2D UNet as backbone and incorporates vision transformers (ViT) [52] into the encoder.…”
Section: A Comparison With Existing Modelsmentioning
confidence: 94%
“…We compared the segmentation performance of the proposed network to those of seven deep-learning-based models, including two baseline models [11], [12], two well-established models for medical image segmentation [13], [14], and three state-of-the-art abdominal organ segmentation models [25], [49], [20].…”
Section: Comparison With Existing Modelsmentioning
confidence: 99%
See 2 more Smart Citations
“…For extracting the global contextual information, the transformer elements encode the input images as a sequence of image patches. Then, the decoder performs the upsampling task needed for retrieving the precise localization [17]. The transformer architecture employs dispense convolution operators and relies on a multihead self-attention mechanism instead.…”
Section: A Swtr-unetmentioning
confidence: 99%