2022
DOI: 10.1109/tgrs.2022.3144165
|View full text |Cite
|
Sign up to set email alerts
|

Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
151
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 289 publications
(151 citation statements)
references
References 63 publications
0
151
0
Order By: Relevance
“…The structure of the ViT is completely different from the CNN, which treats the 2D image as the 1D ordered sequence and applies the selfattention mechanism for global dependency modelling, demonstrating stronger global feature extraction. Driven by this, many researchers in the field of remote sensing introduced ViTs for segmentation-related tasks, such as land cover classification [63][64][65][66][67][68], urban scene parsing [69][70][71][72][73][74], change detection [75,76], road extraction [77] and especially building extraction [78]. For example, Chen et al [79] proposed a sparse token Transformer to learn the global dependency of tokens in both spatial and channel dimensions, achieving state-of-the-art accuracy on benchmark building extraction datasets.…”
Section: B Vit-based Building Extraction Methodsmentioning
confidence: 99%
“…The structure of the ViT is completely different from the CNN, which treats the 2D image as the 1D ordered sequence and applies the selfattention mechanism for global dependency modelling, demonstrating stronger global feature extraction. Driven by this, many researchers in the field of remote sensing introduced ViTs for segmentation-related tasks, such as land cover classification [63][64][65][66][67][68], urban scene parsing [69][70][71][72][73][74], change detection [75,76], road extraction [77] and especially building extraction [78]. For example, Chen et al [79] proposed a sparse token Transformer to learn the global dependency of tokens in both spatial and channel dimensions, achieving state-of-the-art accuracy on benchmark building extraction datasets.…”
Section: B Vit-based Building Extraction Methodsmentioning
confidence: 99%
“…In addition, transformer-based methods were also used in remote sensing images. He et al [55] embedded the Swin transformer into the U-Net for remote sensing semantic segmentation. Therefore, it can obtain global context relationships and improve feature discrimination.…”
Section: Transformer-based Methods In CVmentioning
confidence: 99%
“…Secondly, the transformer structure also benefits a large number of downstream tasks, e.g., semantic segmentation [ 33 ], remote sensing image classification [ 34 , 35 , 36 ] and behavior analysis [ 37 , 38 , 39 ]. However, in tasks such as semantic segmentation and remote sensing image classification, the contribution of a transformer structure is still limited to its advantage in visual features extraction.…”
Section: Related Workmentioning
confidence: 99%