2023
DOI: 10.1109/access.2023.3267435
|View full text |Cite
|
Sign up to set email alerts
|

Remote Sensing Object Detection Based on Convolution and Swin Transformer

Abstract: Remote sensing object detection is an essential task for surveying the earth. It is challenging for the target detection algorithm in natural scenes to obtain satisfactory detection results in remote sensing images. In this paper, the RAST-YOLO (You only look once with Regin Attention and Swin Transformer) algorithm is proposed to address the problems of remote sensing object detection, such as significant differences in target scales, complex backgrounds, and tightly arranged small-size targets. To increase t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 13 publications
(4 citation statements)
references
References 68 publications
0
4
0
Order By: Relevance
“…Some improvements have been made in this regard, including partial modifications to the YOLO-V5 network structure and the integration of coordinate attention mechanisms in the YOLO-extract algorithm [24]. Another approach mentioned earlier is the incorporation of Transformers into the feature extraction layer, such as in RAST-YOLO [25]. This method proposes using the Swin Transformer as the backbone and leveraging the region attention mechanism as the feature extractor and utilizing the C3D module to fuse deep and shallow semantic information to optimize the multi-scale problem in remote sensing target detection.…”
Section: Object Detectionmentioning
confidence: 99%
“…Some improvements have been made in this regard, including partial modifications to the YOLO-V5 network structure and the integration of coordinate attention mechanisms in the YOLO-extract algorithm [24]. Another approach mentioned earlier is the incorporation of Transformers into the feature extraction layer, such as in RAST-YOLO [25]. This method proposes using the Swin Transformer as the backbone and leveraging the region attention mechanism as the feature extractor and utilizing the C3D module to fuse deep and shallow semantic information to optimize the multi-scale problem in remote sensing target detection.…”
Section: Object Detectionmentioning
confidence: 99%
“…Ref. [ 36 ] proposed a fusion of convolutional neural networks and a Transformer in the backbone feature extraction network. By parallel use of region attention mechanism modules with the Swin Transformer, they extended information interaction within the window globally.…”
Section: Literature Reviewmentioning
confidence: 99%
“…With the rapid development of remote sensing technology, object detection in remote sensing images has emerged as a burgeoning research area in computer vision. Various studies have focused on utilizing deep-learning-based object detection methods in the domain of remote sensing [1][2][3][4][5][6]. However, detecting targets in these images has shown itself to be challenging due to the objects' varying scales and resolutions.…”
Section: Introductionmentioning
confidence: 99%