Xinran Guan scite author profile

Object detection based on remote sensing imagery has become increasingly popular over the past few years. Unlike natural images taken by humans or surveillance cameras, the scale of remote sensing images is large, which requires the training and inference procedure to be on a cutting image. However, objects appearing in remote sensing imagery are often sparsely distributed and the labels for each class are imbalanced. This results in unstable training and inference. In this paper, we analyze the training characteristics of the remote sensing images and propose the fusion of the aggregated-mosaic training method, with the assigned-stitch augmentation and auto-target-duplication. In particular, based on the ground truth and mosaic image size, the assigned-stitch augmentation enhances each training sample with an appropriate account of objects, facilitating the smooth training procedure. Hard to detect objects, or those in classes with rare samples, are randomly selected and duplicated by the auto-target-duplication, which solves the sample imbalance or classes with insufficient results. Thus, the training process is able to focus on weak classes. We employ VEDAI and NWPU VHR-10, remote sensing datasets with sparse objects, to verify the proposed method. The YOLOv5 adopts the Mosaic as the augmentation method and is one of state-of-the-art detectors, so we choose Mosaic (YOLOv5) as the baseline. Results demonstrate that our method outperforms Mosaic (YOLOv5) by 2.72% and 5.44% on 512 × 512 and 1024 × 1024 resolution imagery, respectively. Moreover, the proposed method outperforms Mosaic (YOLOv5) by 5.48% under the NWPU VHR-10 dataset.

show abstract

Vehicle Detection Based on Adaptive Multimodal Feature Fusion and Cross-Modal Vehicle Index Using RGB-T Images

Guan

Zhao

et al. 2023

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

Target detection is a critical task in interpreting aerial images. Small target detection, such as vehicles, is challenging. Different lighting conditions affect the accuracy of vehicle detection. For example, vehicles are difficult to distinguish from the background in RGB images under low illumination conditions. In contrast, under high illumination conditions, the color and texture of vehicles are not significantly different in thermal infrared (TIR) images. To improve the accuracy of vehicle detection under various illumination conditions, we propose an adaptive multi-modal feature fusion and cross-modal vehicle index (AFFCM) model for vehicle detection. Based on the single-stage object detection model, AFFCM uses red, green, blue, and thermal infrared (RGB-T) images. It comprises three parts: 1) the softpooling channel attention (SCA) mechanism calculates the cross-modal feature weights of the RGB and TIR features using a fully connected layer during global weighted pooling. 2) We design a multi-modal adaptive feature fusion (MAFF) module based on the cross-modal feature weights derived from the SCA mechanism. The MAFF selects features with high weight, compresses redundant features with low weight, and performs adaptive fusion using a multi-scale feature pyramid. 3) A crossmodal vehicle index is established to extract the target area, suppress complex background information, and minimize false alarms in vehicle detection. The mean average precision (mAP) on the Drone Vehicle dataset are 14.44% and 5.02% higher than those obtained using only RGB or TIR images. The mAP is 2.63% higher than that of state-of-the-art (SOTA) methods that utilize RGB and TIR images.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xinran Guan

Enzyme-responsive sulfatocyclodextrin/prodrug supramolecular assembly for controlled release of anti-cancer drug chlorambucil

An Improved Aggregated-Mosaic Method for the Sparse Object Detection of Remote Sensing Imagery

Vehicle Detection Based on Adaptive Multimodal Feature Fusion and Cross-Modal Vehicle Index Using RGB-T Images

Contact Info

Product

Resources

About