Compared with the traditional method based on hand-crafted features, deep neural network has achieved a certain degree of success on remote sensing (RS) image semantic segmentation. However, there are still serious holes, rough edge segmentation, and false detection or even missed detection due to the light and its shadow in the segmentation. Aiming at the above problems, this article proposes a RS semantic segmentation model SCG-TransNet that is a hybrid model of Swin transformer and Deeplabv3+, which includes Swin-Conv-Dspp (SCD) and global local transformer block (GLTB). First, the SCD module which can efficiently extract feature information from objects at different scales is used to mitigate the hole phenomenon, reducing the loss of detailed information. Second, we construct a GLTB with spatial pyramid pooling shuffle module to extract critical detail information from the limited visible pixels of the occluded objects, which alleviates the problem of difficult object recognition due to occlusion effectively. Finally, the experimental results show that our SCG-TransNet achieves a mean intersection over union of 70.29% on the Vaihingen datasets, which is 3% higher than the baseline model. It also achieved good results on POSDAM datasets. These demonstrate the effectiveness, robustness, and superiority of our proposed method compared with existing state-of-the-art methods.
The main task of remote sensing change detection (CD) is to identify object differences in bitemporal remote sensing images. In recent years, methods based on deep convolutional neural networks (CNNs) have made great progress in remote sensing CD. However, due to illumination changes and seasonal changes in the images acquired by the same sensor, the problem of "pseudo change" in the change map is still difficult to solve. In this article, in order to reduce "pseudo changes", we propose a multi-scale difference feature enhancement network (MDFENet) to extract the most discriminative features from bitemporal remote sensing images. MDFENet contains three procedures: first, multi-scale bitemporal features are generated by a shared weighted Siamese encoder. Then features of each scale are fed into a difference enhancement module to generate refined difference features. Finally, they are combined and reconstructed by a decoder to generate change map. The difference enhancement module includes multiple layers of difference enhancement (DE) encoder and transformer decoder. They are applied to features of different scales to establish long-range relationships of pixels semantic changes, while high-level difference features participate in the generation of low-level difference features to enhance information transmission among features of different scales, reducing "pseudo changes". Compared with state-of-the-art methods, the proposed method achieved the best performance on two datasets, with F1 of 81.15% on the SYSU-CD dataset and 90.85% on the LEVIR-CD dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.