STransFuse: Fusing Swin Transformer and Convolutional Neural Network for Remote Sensing Image Semantic Segmentation

Gao, Liang; Liu, Hui; Yang, Minhang; Chen, Long; Wan, Yaling; Xiao, Zhengqing; Qian, Yurong

doi:10.1109/jstars.2021.3119654

Cited by 151 publications

(70 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The models based on Swin Transformer architecture have demonstrated superior performance in computer vision fields such as image classification, target detection, and semantic segmentation [39]. In this paper, we proposed the LEG Transformer method to classify different fault states.…”

Section: Leg Transformer Methodsmentioning

confidence: 99%

Intelligent Bearing Fault Diagnosis Based on Multivariate Symmetrized Dot Pattern and LEG Transformer

et al. 2022

View full text Add to dashboard Cite

Deep learning based on vibration signal image representation has proven to be effective for the intelligent fault diagnosis of bearings. However, previous studies have focused primarily on dealing with single-channel vibration signal processing, which cannot guarantee the integrity of fault feature information. To obtain more abundant fault feature information, this paper proposes a multivariate vibration data image representation method, named the multivariate symmetrized dot pattern (M-SDP), by combining multivariate variational mode decomposition (MVMD) with symmetrized dot pattern (SDP). In M-SDP, the vibration signals of multiple sensors are simultaneously decomposed by MVMD to obtain the dominant subcomponents with physical meanings. Subsequently, the dominant subcomponents are mapped to different angles of the SDP image to generate the M-SDP image. Finally, the parameters of M-SDP are automatically determined based on the normalized cross-correlation coefficient (NCC) to maximize the difference between different bearing states. Moreover, to improve the diagnosis accuracy and model generalization performance, this paper introduces the local-to-global (LG) attention block and locally enhanced positional encoding (LePE) mechanism into a Swin Transformer to propose the LEG Transformer method. Then, a novel intelligent bearing fault diagnosis method based on M-SDP and the LEG Transformer is developed. The proposed method is validated with two experimental datasets and compared with some other methods. The experimental results indicate that the M-SDP method has improved diagnostic accuracy and stability compared with the original SDP, and the proposed LEG Transformer outperforms the typical Swin Transformer in recognition rate and convergence speed.

show abstract

Section: Leg Transformer Methodsmentioning

confidence: 99%

Intelligent Bearing Fault Diagnosis Based on Multivariate Symmetrized Dot Pattern and LEG Transformer

et al. 2022

View full text Add to dashboard Cite

show abstract

“…A self-supervised multitask representation learning method was designed to capture effective visual representations of remote sensing images in [22] for semantic segmentation. In [23], authors introduced the STransFuse model as a new semantic segmentation method for remote sensing images. Gao et al [24] proposed a novel unsupervised domain adaptive semantic segmentation method by selecting some classes from a source domain image and softly pasting the corresponding image patch on both source and target training images with a fusion weight.…”

Section: Related Workmentioning

confidence: 99%

FMWDCT: Foreground Mixup Into Weighted Dual-Network Cross Training for Semisupervised Remote Sensing Road Extraction

You

Wang

Chen

et al. 2022

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

With the development of deep learning, the application of automatic road extraction has achieved great success. However, the main challenge is how to make full use of a large number of unlabeled images to improve segmentation models and how alleviate sample imbalance in road extraction tasks. In this paper, we propose a novel semi-supervised remote sensing road extraction approach is refined as Foreground Mixup into Weighted Dual-network Cross Training (FMWDCT), which combines labeled images with unlabeled images to extract road from remote sensing images. FMWDCT is composed of Dualnetwork Cross Training (DCT) and Foreground Pasting (FP). DCT is a new semi-supervised training method and FP is an effective data perturbation method for road extraction. We firstly paste the foreground pixels obtained from labeled images into unlabeled images to produce mixed input images. The mixed pseudo labels are then generated by a combination of highconfidence predictions from the augmented network and labeled masks. Finally, the mixed pseudo labels are used to guide another adversarial basic network for cross training, and this basic network is used to smoothly update the augmented network that corresponds to it. The proposed FMWDCT effectively solve the overfitting problem and imbalance problem of positive and negative sample in the case of a few labeled training samples. We demonstrate the effectiveness of our method on three road extraction datasets, and achieve better performance with few labeled data. Extensive experiments show that the proposed semi-supervised method can learn latent information from the unlabeled data to improve performance.

show abstract

“…However, most of this research has been conducted on high-resolution remote sensing images because of their high spatial information and the appropriate feature scale of the target. On the contrary, the scale of features contained in the medium-resolution remote sensing images varies greatly [33]. The poor performance of medium-resolution remote sensing image segmentation was due to its insufficient spatial feature information on the one hand [34], and many large scale features cannot be extracted in medium resolution due to the perceptual field limitation of CNNs.…”

Section: Introductionmentioning

confidence: 99%

Multi-Category Segmentation of Sentinel-2 Images Based on the Swin UNet Method

Yao

Jin

2022

Remote Sensing

View full text Add to dashboard Cite

Medium-resolution remote sensing satellites have provided a large amount of long time series and full coverage data for Earth surface monitoring. However, the different objects may have similar spectral values and the same objects may have different spectral values, which makes it difficult to improve the classification accuracy. Semantic segmentation of remote sensing images is greatly facilitated via deep learning methods. For medium-resolution remote sensing images, the convolutional neural network-based model does not achieve good results due to its limited field of perception. The fast-emerging vision transformer method with self-attentively capturing global features well provides a new solution for medium-resolution remote sensing image segmentation. In this paper, a new multi-class segmentation method is proposed for medium-resolution remote sensing images based on the improved Swin UNet model as a pure transformer model and a new pre-processing, and the image enhancement method and spectral selection module are designed to achieve better accuracy. Finally, 10-categories segmentation is conducted with 10-m resolution Sentinel-2 MSI (Multi-Spectral Imager) images, which is compared with other traditional convolutional neural network-based models (DeepLabV3+ and U-Net with different backbone networks, including VGG, ResNet50, MobileNet, and Xception) with the same sample data, and results show higher Mean Intersection Over Union (MIOU) (72.06%) and better accuracy (89.77%) performance. The vision transformer method has great potential for medium-resolution remote sensing image segmentation tasks.

show abstract

STransFuse: Fusing Swin Transformer and Convolutional Neural Network for Remote Sensing Image Semantic Segmentation

Cited by 151 publications

References 38 publications

Intelligent Bearing Fault Diagnosis Based on Multivariate Symmetrized Dot Pattern and LEG Transformer

Intelligent Bearing Fault Diagnosis Based on Multivariate Symmetrized Dot Pattern and LEG Transformer

FMWDCT: Foreground Mixup Into Weighted Dual-Network Cross Training for Semisupervised Remote Sensing Road Extraction

Multi-Category Segmentation of Sentinel-2 Images Based on the Swin UNet Method

Contact Info

Product

Resources

About