Military vehicle object detection technology in complex environments is the basis for the implementation of reconnaissance and tracking tasks for weapons and equipment, and is of great significance for information and intelligent combat. In response to the poor performance of traditional detection algorithms in military vehicle detection, we propose a military vehicle detection method based on hierarchical feature representation and reinforcement learning refinement localization, referred to as MVODM. First, for the military vehicle detection task, we construct a reliable dataset MVD. Second, we design two strategies, hierarchical feature representation and reinforcement learning-based refinement localization, to improve the detector. The hierarchical feature representation strategy can help the detector select the feature representation layer suitable for the object scale, and the reinforcement learning-based refinement localization strategy can improve the accuracy of the object localization boxes. The combination of these two strategies can effectively improve the performance of the detector. Finally, the experimental results on the homemade dataset show that our proposed MVODM has excellent detection performance and can better accomplish the detection task of military vehicles.INDEX TERMS Military vehicle objects, object detection, reinforcement learning, hierarchical feature representation.
The remote sensing images in large scenes have a complex background, and the types, sizes, and postures of the targets are different, making object detection in remote sensing images difficult. To solve this problem, an end-to-end multi-size object detection method based on a dual attention mechanism is proposed in this paper. First, the MobileNets backbone network is used to extract multi-layer features of remote sensing images as the input of MFCA, a multi-size feature concentration attention module. MFCA employs an attention mechanism to suppress noise, enhance effective feature reuse, and improve the adaptability of the network to multi-size target features through multi-layer convolution operation. Then, TSDFF (two-stage deep feature fusion module)deeply fuses the feature maps output by MFCA to maximize the correlation between the feature sets and especially improve the feature expression of small targets. Next, the GLCNet (global-local context network) and the SSA (significant simple attention module) distinguish the fused features and screen out useful channel information, which makes the detected features more representative. Finally, the loss function is improved to truly reflect the difference between the candidate frames and the real frames, enhancing the network's ability to predict complex samples. The performance of our proposed method is compared with other advanced algorithms on NWPU VHR-10, DOTA, RSOD open datasets. Experimental results show that our proposed method achieves the best AP (average precision) and mAP (mean average precision), indicating that the method can accurately detect multi-type, multi-size, and multi-posture targets with high adaptability.
Most contemporary pedestrian detection algorithms are based on visible light image detection. However, in environments with dim light, small targets, and easily occluded and cluttered backgrounds, single-mode visible light images relying on color, texture, and other features cannot adequately represent the feature information of targets; as a result, a large number of targets are lost and the algorithm performance is not good. To address this problem, we propose a dual-modal multi-scale feature fusion network (DMFFNet). First, we use the MobileNet v3 backbone network to extract the features of dual-modal images as input for the multi-scale fusion attention (MFA) module, combining the idea of multi-scale feature fusion and attention mechanism. Second, we deeply fuse the multi-scale features output by the MFA with the double deep feature fusion (DDFF) module to enhance the semantic and geometric information of the target. Finally, we optimize the loss function to reflect the distance between the predicted box and the real box more realistically as well as to enhance the ability of the network toward predicting difficult samples. We performed multi-directional evaluations on the KAIST dual-light pedestrian dataset and the visible-thermal infrared pedestrian dataset (VTI) in our laboratory through comparative and ablation experiments. The overall MR -2 on the KAIST duallight pedestrian dataset is 9.26%, and the MR -2 in dim light, partial occlusion, and severe occlusion are 5.17%, 23.35%, and 47.31%, respectively. The overall MR -2 on the VIT dual-light pedestrian dataset is 9.26%, and the MR -2 in dim light, partial occlusion, and severe occlusion are 5.17%, 23.35%, and 47.31%, respectively. The results show that the algorithm performs well on pedestrian detection, especially in dim light and when the target was occluded.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.