With the rapid development of unmanned aerial vehicles (UAVs), aerial targets detection has attracted extensive attention from researchers. The difficulty of aerial image detection lies in the small proportion of ground targets in aerial images and the wide variety of target sizes. After multiple downsampling, the features of small targets are almost not available on the feature maps. To address these drawbacks, a densely connected Inception ResNet (RIDNet) is proposed. RIDNet is a lightweight multi-scale fusion detection network constructed with two residual inception units (RI): the RI-Dense model and the RI-Deconv model. The RI-Dense model consists of densely connected layers and shortcut connections. Each convolutional layer in RI-Dense has access to all the subsequent layers and passes on the information that needs to be preserved. The RI-Deconv fuses the global feature in a residual and hierarchical way, which continuously deconvolutes the output of RI-Dense and concatenates the result with the original output to get fusion layers. The fused layers absorb semantic information and detailed information from deep layers and shallow layers, respectively. Extensive experiments show the effectiveness of the proposed RIDNet. Ablation experiments also demonstrate that the RI-Dense model and RI-Deconv model can improve the mAP by 7.8% and 6.8%, respectively.INDEX TERMS Object detection, convolutional network, unmanned aerial vehicle.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.