SoutheastCon 2022 2022
DOI: 10.1109/southeastcon48659.2022.9764016
|View full text |Cite
|
Sign up to set email alerts
|

Improving Classification of Remotely Sensed Images with the Swin Transformer

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 20 publications
(10 citation statements)
references
References 14 publications
0
5
0
Order By: Relevance
“…The standard transformer performs global self-attention in images, whose computation is quadratic in complexity to the input vector and is not suitable for high-resolution images, especially for large computational tasks, such as remote sensing data processing 27 33 Swin Transformer proposes to perform self-attention in nonoverlapping windows.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The standard transformer performs global self-attention in images, whose computation is quadratic in complexity to the input vector and is not suitable for high-resolution images, especially for large computational tasks, such as remote sensing data processing 27 33 Swin Transformer proposes to perform self-attention in nonoverlapping windows.…”
Section: Methodsmentioning
confidence: 99%
“…The standard transformer performs global self-attention in images, whose computation is quadratic in complexity to the input vector and is not suitable for high-resolution images, especially for large computational tasks, such as remote sensing data processing. [27][28][29][30][31][32][33] Swin Transformer proposes to perform self-attention in nonoverlapping windows. Computational complexity of a global MSA module and a window based one W-MSA is E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 6 2 8…”
Section: Swin Transformer Blockmentioning
confidence: 99%
“…With the advancement of Vision Transformers (ViT), many applications are adopting it for image classification tasks [23], including EuroSAT [24,25]. It is suggested that further scaling can enhance performance [26], but this model has yet to be integrated with Geospatial data.…”
Section: Introductionmentioning
confidence: 99%
“…Among all types, Swin-transformer is a novel backbone network of hierarchical Vision Transformer, using a multi-head self-attention mechanism that can focus on a sequence of image patches to encode global, local, and contextual cues with certain flexibilities [30]. Swin-transformer has already shown its compelling records in various computer vision tasks, including region-level object detection [31], pixel-level semantic segmentation [32], and image-level classification [33]. Particularly, it exhibited strong robustness to severe occlusions from foreground objects, random patch locations, and non-salient background regions.…”
Section: Introductionmentioning
confidence: 99%