Improving Classification of Remotely Sensed Images with the Swin Transformer

Jannat, Fatema-E; Willis, Andrew

doi:10.1109/southeastcon48659.2022.9764016

Cited by 20 publications

(10 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Section: Methodsmentioning

confidence: 99%

“…The standard transformer performs global self-attention in images, whose computation is quadratic in complexity to the input vector and is not suitable for high-resolution images, especially for large computational tasks, such as remote sensing data processing. [27][28][29][30][31][32][33] Swin Transformer proposes to perform self-attention in nonoverlapping windows. Computational complexity of a global MSA module and a window based one W-MSA is E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 6 2 8…”

Section: Swin Transformer Blockmentioning

confidence: 99%

See 1 more Smart Citation

MST-UNet: a modified Swin Transformer for water bodies’ mapping using Sentinel-2 images

Xie

2023

J. Appl. Rem. Sens.

View full text Add to dashboard Cite

Deep learning is widely used in remote sensing field of feature recognition.Symmetric encoder-decoder network, such as UNet, is one of the most commonly used image segmentation networks, but the accuracy is often low due to its simple structure. We combine two neural network models of convolutional neural network (CNN) and Swin Transformer called modified Swin Transformer using UNet structure (MST-UNet) to achieve accurate segmentation of water bodies from remote sensing data, with Xiamen City as study area. MST-UNet is based on symmetric encoder-decoder network. We use CNN and Swin Transformer blocks to extract features from input images and capture the interdependence among different pixels, respectively. More attention is paid to global information of images. By four times upsampling to obtain predictions, the results show that the accuracy of MST-UNet is better than UNet and its improved models. The Intersection of Union (IoU), mean IoU, and Dice score on test set reach 87.80%, 92.93%, 93.08%, respectively, which verifies the feasibility of the MST-UNet. This experiment has a reference value for related studies.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Swin Transformer Blockmentioning

confidence: 99%

MST-UNet: a modified Swin Transformer for water bodies’ mapping using Sentinel-2 images

Xie

2023

J. Appl. Rem. Sens.

View full text Add to dashboard Cite

show abstract

“…With the advancement of Vision Transformers (ViT), many applications are adopting it for image classification tasks [23], including EuroSAT [24,25]. It is suggested that further scaling can enhance performance [26], but this model has yet to be integrated with Geospatial data.…”

Section: Introductionmentioning

confidence: 99%

Mapping of Land Use and Land Cover (LULC) Using EuroSAT and Transfer Learning

Kunwar,

Ferdush

2024

RIG

View full text Add to dashboard Cite

As the global population continues to expand, the demand for natural resources increases. Unfortunately, human activities account for 23% of greenhouse gas emissions. On a positive note, remote sensing technologies have emerged as a valuable tool in managing our environment. These technologies allow us to monitor land use, plan urban areas, and drive advancements in areas such as agriculture, climate change mitigation, disaster recovery, and environmental monitoring. Recent advances in Artificial Intelligence (AI), computer vision, and earth observation data have enabled unprecedented accuracy in land use mapping. By using transfer learning and fine-tuning with red-green-blue (RGB) bands, we achieved an impressive 99.19% accuracy in land use analysis. Such findings can be used to inform conservation and urban planning policies.

show abstract

“…Among all types, Swin-transformer is a novel backbone network of hierarchical Vision Transformer, using a multi-head self-attention mechanism that can focus on a sequence of image patches to encode global, local, and contextual cues with certain flexibilities [30]. Swin-transformer has already shown its compelling records in various computer vision tasks, including region-level object detection [31], pixel-level semantic segmentation [32], and image-level classification [33]. Particularly, it exhibited strong robustness to severe occlusions from foreground objects, random patch locations, and non-salient background regions.…”

Section: Introductionmentioning

confidence: 99%

Swin-Transformer-YOLOv5 for Real-Time Wine Grape Bunch Detection

Liu

et al. 2022

Remote Sensing

View full text Add to dashboard Cite

Precise canopy management is critical in vineyards for premium wine production because maximum crop load does not guarantee the best economic return for wine producers. The growers keep track of the number of grape bunches during the entire growing season for optimizing crop load per vine. Manual counting of grape bunches can be highly labor-intensive and error prone. Thus, an integrated, novel detection model, Swin-transformer-YOLOv5, was proposed for real-time wine grape bunch detection. The research was conducted on two varieties of Chardonnay and Merlot from July to September 2019. The performance of Swin-T-YOLOv5 was compared against commonly used detectors. All models were comprehensively tested under different conditions, including two weather conditions, two berry maturity stages, and three sunlight intensities. The proposed Swin-T-YOLOv5 outperformed others for grape bunch detection, with mean average precision (mAP) of up to 97% and F1-score of 0.89 on cloudy days. This mAP was ~44%, 18%, 14%, and 4% greater than Faster R-CNN, YOLOv3, YOLOv4, and YOLOv5, respectively. Swin-T-YOLOv5 achieved an R2 of 0.91 and RMSE of 2.4 (number of grape bunches) compared with the ground truth on Chardonnay. Swin-T-YOLOv5 can serve as a reliable digital tool to help growers perform precision canopy management in vineyards.

show abstract

Improving Classification of Remotely Sensed Images with the Swin Transformer

Cited by 20 publications

References 14 publications

MST-UNet: a modified Swin Transformer for water bodies’ mapping using Sentinel-2 images

MST-UNet: a modified Swin Transformer for water bodies’ mapping using Sentinel-2 images

Mapping of Land Use and Land Cover (LULC) Using EuroSAT and Transfer Learning

Swin-Transformer-YOLOv5 for Real-Time Wine Grape Bunch Detection

Contact Info

Product

Resources

About