2023
DOI: 10.1109/jstars.2022.3225150
|View full text |Cite
|
Sign up to set email alerts
|

Improved Swin Transformer-Based Semantic Segmentation of Postearthquake Dense Buildings in Urban Areas Using Remote Sensing Images

Abstract: Timely acquiring accurate information on the earthquake-induced damage of buildings is crucial for emergency assessment and post-disaster rescue. Optical remote sensing photography has been a typical method for obtaining seismic data in the early stage after an earthquake due to its wide coverage and fast response speed. Currently, convolutional neural networks (CNNs) are widely applied for remote sensing image recognition. However, insufficient extraction and expression ability of global correlations between … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 43 publications
(13 citation statements)
references
References 34 publications
0
13
0
Order By: Relevance
“…Vision transformers are gaining popularity in semantic segmentation for the advantage gained from their special network architecture. These models learn pixel-level feature representation using a transformer-based encoder, an attention mechanism, and a bottle-neck layer [ 19 , 20 ]. SegViT [ 21 ] stands apart from conventional ViTs because it uses an Attention-to-Mask (ATM) module.…”
Section: Related Workmentioning
confidence: 99%
“…Vision transformers are gaining popularity in semantic segmentation for the advantage gained from their special network architecture. These models learn pixel-level feature representation using a transformer-based encoder, an attention mechanism, and a bottle-neck layer [ 19 , 20 ]. SegViT [ 21 ] stands apart from conventional ViTs because it uses an Attention-to-Mask (ATM) module.…”
Section: Related Workmentioning
confidence: 99%
“…Furthermore, researchers [8], [16] have also tried to use Swin Transformer as a backbone network in combination with CNNs for RS imaging segmentation tasks. In this regard, related research works [60], [61], [62], [63] have indicated that Swin Transformer as an encoder can be combined with different decoder architectures (e.g., Unet [13], PSP [20], and FPN [64]) for diverse tasks to achieve the optimal segmentation outcomes. Besides, UnetFormer [65], the current state-of-the-art (SoTA) network for semantic segmentation in RS, proposes a hybrid Transformer and CNN lightweight network for real-time urban scene segmentation, which looks similar to ours but is quite different.…”
Section: Semantic Segmentation Based On Transformermentioning
confidence: 99%
“…The multi-scale features of densely distributed buildings are extracted and fused, and three statistics (e.g., the angular second moment, dissimilarity, and inverse difference moment) are further discovered based on the gray-level cooccurrence matrix as the texture features to distinguish damage intensities of buildings. Furthermore, a novel improved Swin Transformer is proposed to segment dense urban buildings at pixel level by remote sensing images with complex backgrounds, as shown in Figure 3 [13]. The original Swin Transformer is utilized as the backbone of the encoder, and a convolutional block attention module is inserted at patch embedding and patch merging stages.…”
Section: Large-scale Coarse Assessment By Remote Sensing Satellite Im...mentioning
confidence: 99%