2022
DOI: 10.3390/rs14122861
|View full text |Cite
|
Sign up to set email alerts
|

Swin-Transformer-Enabled YOLOv5 with Attention Mechanism for Small Object Detection on Satellite Images

Abstract: Object detection has made tremendous progress in natural images over the last decade. However, the results are hardly satisfactory when the natural image object detection algorithm is directly applied to satellite images. This is due to the intrinsic differences in the scale and orientation of objects generated by the bird’s-eye perspective of satellite photographs. Moreover, the background of satellite images is complex and the object area is small; as a result, small objects tend to be missing due to the cha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
68
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 138 publications
(68 citation statements)
references
References 36 publications
0
68
0
Order By: Relevance
“…Figure 9(e) is a picture taken at a high altitude. The vehicle on the road is very small, but it can still improve the detection effect, indicating that the model can detect small objects [35].…”
Section: G Algorithm Validity Analysismentioning
confidence: 99%
“…Figure 9(e) is a picture taken at a high altitude. The vehicle on the road is very small, but it can still improve the detection effect, indicating that the model can detect small objects [35].…”
Section: G Algorithm Validity Analysismentioning
confidence: 99%
“…These methods have achieved good results in natural image datasets such as MS COCO, PASCAL VOC, etc. [ 5 ]. However, when these methods are used in remote sensing images, the results are always hardly satisfactory.…”
Section: Introductionmentioning
confidence: 99%
“…The transformer model was first proposed in 2017 [49] in the field of natural language processing (NLP), and was extended to deal with a computer vision task in 2020 [50]. The vision transformer (ViT) models have also been introduced into the field of remote sensing image processing, for applications such as semantic segmentation [51] and object detection [52], and have achieved competitive results, compared with CNN models. Some researchers have also studied the performance of ViT models in remote sensing image landslide detection [53].…”
Section: Introductionmentioning
confidence: 99%