Multi-Category Segmentation of Sentinel-2 Images Based on the Swin UNet Method

Yao, Junyuan; Jin, Shuanggen

doi:10.3390/rs14143382

Cited by 19 publications

(9 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since transformer-based models can acquire long-range dependencies and convolutional neural networks can capture fine-grained local features, existing literature tends to construct U-shaped architecture based on transformer blocks and convolutional neural networks, which exhibit promising results on remote sensing datasets [ 29 , 30 , 31 , 32 ]. Inspired by these, we propose a novel dual encoder of two branches: Swin Transformer blocks and reslayers in reverse order, along with a decoder of only Swin Transformer blocks.…”

Section: Methodsmentioning

confidence: 99%

Transformer-Based Model with Dynamic Attention Pyramid Head for Semantic Segmentation of VHR Remote Sensing Imagery

Zhou

Huang

2022

Entropy

View full text Add to dashboard Cite

Convolutional neural networks have long dominated semantic segmentation of very-high-resolution (VHR) remote sensing (RS) images. However, restricted by the fixed receptive field of convolution operation, convolution-based models cannot directly obtain contextual information. Meanwhile, Swin Transformer possesses great potential in modeling long-range dependencies. Nevertheless, Swin Transformer breaks images into patches that are single-dimension sequences without considering the position loss problem inside patches. Therefore, Inspired by Swin Transformer and Unet, we propose SUD-Net (Swin transformer-based Unet-like with Dynamic attention pyramid head Network), a new U-shaped architecture composed of Swin Transformer blocks and convolution layers simultaneously through a dual encoder and an upsampling decoder with a Dynamic Attention Pyramid Head (DAPH) attached to the backbone. First, we propose a dual encoder structure combining Swin Transformer blocks and reslayers in reverse order to complement global semantics with detailed representations. Second, aiming at the spatial loss problem inside each patch, we design a Multi-Path Fusion Model (MPFM) with specially devised Patch Attention (PA) to encode position information of patches and adaptively fuse features of different scales through attention mechanisms. Third, a Dynamic Attention Pyramid Head is constructed with deformable convolution to dynamically aggregate effective and important semantic information. SUD-Net achieves exceptional results on ISPRS Potsdam and Vaihingen datasets with 92.51%mF1, 86.4%mIoU, 92.98%OA, 89.49%mF1, 81.26%mIoU, and 90.95%OA, respectively.

show abstract

Section: Methodsmentioning

confidence: 99%

Transformer-Based Model with Dynamic Attention Pyramid Head for Semantic Segmentation of VHR Remote Sensing Imagery

Zhou

Huang

2022

Entropy

View full text Add to dashboard Cite

show abstract

“…End-to-end semantic segmentation networks are predominantly employed in deep learning-based remote sensing image classification to accomplish pixel-level classification 38 . However, for complex feature objects, these semantic segmentation methods exhibit a "pretzel effect," as accurately determining the appropriate class for each pixel can be quite difficult 20,21 .…”

Section: Image Classification Techniquesmentioning

confidence: 99%

Joint superpixel and Transformer for high resolution remote sensing image classification

Dang,

Mao,

Zhang

et al. 2023

Preprint

View full text Add to dashboard Cite

Deep neural networks combined with superpixel segmentation have proven to be superior to high-resolution remote sensingimage (HRI) classification. Currently, most HRI classification methods that combine deep learning and superpixel segmentationuse stacking on multiple scales to extract contextual information from segmented objects. However, this approach does nottake into account the contextual dependencies between each segmented object. To solve this problem, a joint superpixel andTransformer (JST) framework is proposed for HRI classification. In JST, HRI is first segmented into superpixel objects as input,and Transformer is used to model the long-range dependencies. The contextual relationship between each input superpixelobject is obtained and the class of analyzed objects is output by designing an encoding and decoding Transformer. Additionally,we explore the effect of semantic range on classification accuracy. JST is also tested by using two HRI datasets with overallclassification accuracy, average accuracy and Kappa coefficients of 0.79, 0.70, 0.78 and 0.91, 0.85, 0.89, respectively. Theeffectiveness of the proposed method is compared qualitatively and quantitatively, and the results achieve competitive andconsistently better than the benchmark comparison method.

show abstract

“…This technique is extremely useful for classifi cation tasks in EO due to images rarely containing only one class, and where contex within an image is important [12]. Figure 2 presents an example of multiclass image seg mentation in the context of land with several land cover classification [27]. The convolutional layer performs the majority of the computation.…”

Section: Image Segmentation and Cnnsmentioning

confidence: 99%

Deep Learning Based Burnt Area Mapping Using Sentinel 1 for the Santa Cruz Mountains Lightning Complex (CZU) and Creek Fires 2020

Luft

Schillaci

Ceccherini

et al. 2022

Fire

View full text Add to dashboard Cite

The study presented here builds on previous synthetic aperture radar (SAR) burnt area estimation models and presents the first U-Net (a convolutional network architecture for fast and precise segmentation of images) combined with ResNet50 (Residual Networks used as a backbone for many computer vision tasks) encoder architecture used with SAR, Digital Elevation Model, and land cover data for burnt area mapping in near-real time. The Santa Cruz Mountains Lightning Complex (CZU) was one of the most destructive fires in state history. The results showed a maximum burnt area segmentation F1-Score of 0.671 in the CZU, which outperforms current models estimating burnt area with SAR data for the specific event studied models in the literature, with an F1-Score of 0.667. The framework presented here has the potential to be applied on a near real-time basis, which could allow land monitoring as the frequency of data capture improves.

show abstract

Multi-Category Segmentation of Sentinel-2 Images Based on the Swin UNet Method

Cited by 19 publications

References 40 publications

Transformer-Based Model with Dynamic Attention Pyramid Head for Semantic Segmentation of VHR Remote Sensing Imagery

Transformer-Based Model with Dynamic Attention Pyramid Head for Semantic Segmentation of VHR Remote Sensing Imagery

Joint superpixel and Transformer for high resolution remote sensing image classification

Deep Learning Based Burnt Area Mapping Using Sentinel 1 for the Santa Cruz Mountains Lightning Complex (CZU) and Creek Fires 2020

Contact Info

Product

Resources

About