2021
DOI: 10.48550/arxiv.2106.10637
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

More than Encoder: Introducing Transformer Decoder to Upsample

Abstract: General segmentation models downsample images and then upsample to restore resolution for pixel level prediction. In such schema, upsample technique is vital in maintaining information for better performance. In this paper, we present a new upsample approach, Attention Upsample (AU), that could serve as general upsample method and be incorporated into any segmentation model that possesses lateral connections. AU leverages pixel-level attention to model long range dependency and global information for better re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 56 publications
0
6
0
Order By: Relevance
“…Xu et al [38] extensively studied the trade-off between transformers and convnets and proposed a more efficient encoder named LeViT-UNet. Li et al [17] presented a new up-sampling approach and incorporated it into the decoder of UNet to model long-term dependencies and global information for better reconstruction results. TransClaw U-Net [4] utilized transformers in UNet with more convolutional feature pyramids.…”
Section: Related Workmentioning
confidence: 99%
“…Xu et al [38] extensively studied the trade-off between transformers and convnets and proposed a more efficient encoder named LeViT-UNet. Li et al [17] presented a new up-sampling approach and incorporated it into the decoder of UNet to model long-term dependencies and global information for better reconstruction results. TransClaw U-Net [4] utilized transformers in UNet with more convolutional feature pyramids.…”
Section: Related Workmentioning
confidence: 99%
“…The Medical imaging field started adopting transformers right away, Implementing them in different applications for diagnosis and prognosis, He et al (2022) 9 present a comprehensive review of many applications using transformer-based architectures in the medical imaging field. For segmentation, transformer-based architectures were used for example, in cardiac segmentation, 10,11,12,13,14 achieved state-of-the-art results on different related datasets, while in multi-organ segmentation 10,11,13,14,15,16,17 also achieved state-of-the-art results.…”
Section: Vision Transformersmentioning
confidence: 99%
“…Using well-known data sets related to different use cases and working on top of [52], Roy et al [114] verified that the addition of a spatial-channel squeeze and excitation block works as an attention mechanism in FCNs and improves the quality of the segmentation maps. Interestingly, in most recent papers, the authors approach the problem of medical image segmentation using encoder-decoder structures that generally comprise a CNN and a Transformer to extract volumetric spatial feature maps, perform global feature modeling and predict refined segmentation maps [115][116][117][118], although there are already some works that replace the CNN-based modules at the encoder or decoder levels and integrate other attention mechanisms to extract features and model longrange dependencies [119]. More recent methodologies on medical image segmentation are taking advantage of a hybrid use of the Vision Transformer and the U-Net with improved results regarding the quality of segmentation maps [120,121].…”
Section: B Medical Image Segmentationmentioning
confidence: 99%