Remote Sensing Object Detection Based on Convolution and Swin Transformer

Jiang, Xuzhao; Wu, Yonghong

doi:10.1109/access.2023.3267435

Cited by 13 publications

(4 citation statements)

References 68 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Some improvements have been made in this regard, including partial modifications to the YOLO-V5 network structure and the integration of coordinate attention mechanisms in the YOLO-extract algorithm [24]. Another approach mentioned earlier is the incorporation of Transformers into the feature extraction layer, such as in RAST-YOLO [25]. This method proposes using the Swin Transformer as the backbone and leveraging the region attention mechanism as the feature extractor and utilizing the C3D module to fuse deep and shallow semantic information to optimize the multi-scale problem in remote sensing target detection.…”

Section: Object Detectionmentioning

confidence: 99%

SCCMDet: Adaptive Sparse Convolutional Networks Based on Class Maps for Real-Time Onboard Detection in Unmanned Aerial Vehicle Remote Sensing Images

Tan,

Yang,

Qiu

et al. 2024

Remote Sensing

View full text Add to dashboard Cite

Onboard, real-time object detection in unmaned aerial vehicle remote sensing (UAV-RS) has always been a prominent challenge due to the higher image resolution required and the limited computing resources available. Due to the trade-off between accuracy and efficiency, the advantages of UAV-RS are difficult to fully exploit. Current sparse-convolution-based detectors only convolve some of the meaningful features in order to accelerate the inference speed. However, the best approach to the selection of meaningful features, which ultimately determines the performance, is an open question. This study proposes the use of adaptive sparse convolutional networks based on class maps for real-time onboard detection in UAV-RS images (SCCMDet) to solve this problem. For data pre-processing, SCCMDet obtains the real class maps as labels from the ground truth to supervise the feature selection process. In addition, a generate class map network (GCMN), equipped with a newly designed loss function, identifies the importance of features to generate a binary class map which filters the image for its more meaningful sparse features. Comparative experiments were conducted on the VisDrone dataset, and the experimental results show that our method accelerates YOLOv8 by 41.94% at most and increases the performance by 2.52%. Moreover, ablation experiments demonstrate the effectiveness of the proposed model.

show abstract

Section: Object Detectionmentioning

confidence: 99%

SCCMDet: Adaptive Sparse Convolutional Networks Based on Class Maps for Real-Time Onboard Detection in Unmanned Aerial Vehicle Remote Sensing Images

Tan,

Yang,

Qiu

et al. 2024

Remote Sensing

View full text Add to dashboard Cite

show abstract

“…Ref. [ 36 ] proposed a fusion of convolutional neural networks and a Transformer in the backbone feature extraction network. By parallel use of region attention mechanism modules with the Swin Transformer, they extended information interaction within the window globally.…”

Section: Literature Reviewmentioning

confidence: 99%

SRE-YOLOv8: An Improved UAV Object Detection Model Utilizing Swin Transformer and RE-FPN

Li,

Zhang,

Shao

et al. 2024

Sensors

View full text Add to dashboard Cite

To tackle the intricate challenges associated with the low detection accuracy of images taken by unmanned aerial vehicles (UAVs), arising from the diverse sizes and types of objects coupled with limited feature information, we present the SRE-YOLOv8 as an advanced method. Our method enhances the YOLOv8 object detection algorithm by leveraging the Swin Transformer and a lightweight residual feature pyramid network (RE-FPN) structure. Firstly, we introduce an optimized Swin Transformer module into the backbone network to preserve ample global contextual information during feature extraction and to extract a broader spectrum of features using self-attention mechanisms. Subsequently, we integrate a Residual Feature Augmentation (RFA) module and a lightweight attention mechanism named ECA, thereby transforming the original FPN structure to RE-FPN, intensifying the network’s emphasis on critical features. Additionally, an SOD (small object detection) layer is incorporated to enhance the network’s ability to recognize the spatial information of the model, thus augmenting accuracy in detecting small objects. Finally, we employ a Dynamic Head equipped with multiple attention mechanisms in the object detection head to enhance its performance in identifying low-resolution targets amidst complex backgrounds. Experimental evaluation conducted on the VisDrone2021 dataset reveals a significant advancement, showcasing an impressive 9.2% enhancement over the original YOLOv8 algorithm.

show abstract

“…With the rapid development of remote sensing technology, object detection in remote sensing images has emerged as a burgeoning research area in computer vision. Various studies have focused on utilizing deep-learning-based object detection methods in the domain of remote sensing [1][2][3][4][5][6]. However, detecting targets in these images has shown itself to be challenging due to the objects' varying scales and resolutions.…”

Section: Introductionmentioning

confidence: 99%

Object Detection in Remote Sensing Images Based on Adaptive Multi-Scale Feature Fusion Method

Liu,

Zhang,

et al. 2024

Remote Sensing

View full text Add to dashboard Cite

Multi-scale object detection is critical for analyzing remote sensing images. Traditional feature pyramid networks, which are aimed at accommodating objects of varying sizes through multi-level feature extraction, face significant challenges due to the diverse scale variations present in remote sensing images. This situation often forces single-level features to span a broad spectrum of object sizes, complicating accurate localization and classification. To tackle these challenges, this paper proposes an innovative algorithm that incorporates an adaptive multi-scale feature enhancement and fusion module (ASEM), which enhances remote sensing image object detection through sophisticated multi-scale feature fusion. Our method begins by employing a feature pyramid to gather coarse multi-scale features. Subsequently, it integrates a fine-grained feature extraction module at each level, utilizing atrous convolutions with varied dilation rates to refine multi-scale features, which markedly improves the information capture from widely varied object scales. Furthermore, an adaptive enhancement module is applied to the features of each level by employing an attention mechanism for feature fusion. This strategy concentrates on the features of critical scale, which significantly enhance the effectiveness of capturing essential feature information. Compared with the baseline method, namely, Rotated FasterRCNN, our method achieved an mAP of 74.21% ( 0.81%) on the DOTA-v1.0 dataset and an mAP of 84.90% (+9.2%) on the HRSC2016 dataset. These results validated the effectiveness and practicality of our method and demonstrated its significant application value in multi-scale remote sensing object detection tasks.

show abstract

Remote Sensing Object Detection Based on Convolution and Swin Transformer

Cited by 13 publications

References 68 publications

SCCMDet: Adaptive Sparse Convolutional Networks Based on Class Maps for Real-Time Onboard Detection in Unmanned Aerial Vehicle Remote Sensing Images

SCCMDet: Adaptive Sparse Convolutional Networks Based on Class Maps for Real-Time Onboard Detection in Unmanned Aerial Vehicle Remote Sensing Images

SRE-YOLOv8: An Improved UAV Object Detection Model Utilizing Swin Transformer and RE-FPN

Object Detection in Remote Sensing Images Based on Adaptive Multi-Scale Feature Fusion Method

Contact Info

Product

Resources

About