For remote sensing object detection, fusing the optimal feature information automatically and overcoming the sensitivity to adapt multi-scale objects remains a significant challenge for the existing convolutional neural networks. Given this, we develop a convolutional network model with an adaptive attention fusion mechanism (AAFM). The model is proposed based on the backbone network of EfficientDet. Firstly, according to the characteristics of object distribution in datasets, the stitcher is applied to make one image containing objects of various scales. Such a process can effectively balance the proportion of multi-scale objects and handle the scale-variable properties. In addition, inspired by channel attention, a spatial attention model is also introduced in the construction of the adaptive attention fusion mechanism. In this mechanism, the semantic information of the different feature maps is obtained via convolution and different pooling operations. Then, the parallel spatial and channel attention are fused in the optimal proportions by the fusion factors to get the further representative feature information. Finally, the Complete Intersection over Union (CIoU) loss is used to make the bounding box better cover the ground truth. The experimental results of the optical image dataset DIOR demonstrate that, compared with state-of-the-art detectors such as the Single Shot multibox Detector (SSD), You Only Look Once (YOLO) v4, and EfficientDet, the proposed module improves accuracy and has stronger robustness.
Abstract. Accurate matching of multimodal remote sensing (RS) images (e.g., optical, infrared, LiDAR, SAR, and rasterized maps) is still an ongoing challenge because of nonlinear radiometric differences (NRD) between these images. Considering that structural properties are preserved between multimodal images, this paper proposes a robust matching method based on multi-directional and multi-scale structural features, which consist of two critical steps. Firstly, a novel structural descriptor named the Steerable Filters of first- and second-Order Channels (SFOC) is constructed to address severe NRD, which combines the first- and second-order gradient information by using the steerable filters to depict multi-directional and multi-scale structural features of images. Meanwhile, SFOC is further enhanced by performing the dilated Gaussian convolutions with different dilated rates on it, which can capture multi-level context structural features and improve the ability to resist noise. Then, a fast similarity measure, called Fast Normalized Cross-Correlation (Fast-NCCSFOC), is established to detect correspondences by a template matching scheme, which employs the Fast Fourier Transform (FFT) technique and the integral image to improve the matching efficiency. The performance of the proposed SFOC has been evaluated with many different kinds of multimodal RS images, and experimental results show its superior matching performance compared with the state-of-the-art methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.