In view of the lack of feature complementarity between the feature layers of Single Shot MultiBox Detector (SSD) and the weak detection ability of SSD for small objects, we propose an improved SSD object detection algorithm based on Dense Convolutional Network (DenseNet) and feature fusion, which is called DF-SSD. On the basis of SSD, we design the feature extraction network DenseNet-S-32-1 with reference to the dense connection of DenseNet, and replace the original backbone network VGG-16 of SSD with DenseNet-S-32-1 to enhance the feature extraction ability of the model. In the part of multi-scale detection, a fusion mechanism of multi-scale feature layers is introduced to organically combine low-level visual features and high-level semantic features in the network structure. Finally, a residual block is established before the object prediction to further improve the model performance. We train the DF-SSD model from scratch. The experimental results show that our model DF-SSD with 300 × 300 input achieves 81.4% mAP, 79.0% mAP, and 29.5% mAP on PASCAL VOC 2007, VOC 2012, and MS COCO datasets, respectively. Compared with SSD, the detection accuracy of DF-SSD on VOC 2007 is improved by 3.1% mAP. DF-SSD requires only 1/2 parameters to SSD and 1/9 parameters to Faster RCNN. We inject more semantic information into DF-SSD, which makes it have advanced detection effect on small objects and objects with specific relationships.INDEX TERMS DenseNet, feature fusion, multi-scale object detection, SSD.
Considering the multi-scale and occlusion problem of pedestrian detection in natural scenes, we propose an improved Faster R-CNN pedestrian detection algorithm based on feature fusion and context analysis (FCF R-CNN). We design a feature fusion method of progressive cascade on VGG16 network, and add LRN to speed up the convergence of the network. The improved feature extraction network enables our model to generate high-resolution feature maps containing rich, detailed and semantic information. We also adjust the RPN parameters to improve the proposal efficiency. In addition, we add a multi-layer iterative LSTM module to the detection model, which uses LSTM's memory ability to extract the global context information of the candidate boxes. This method only needs the feature map of the image itself as input, which highlights useful context information and enables the model to generate more accurate candidate boxes containing potential pedestrians. Our method performs better than existing methods in detecting small-size and occluded pedestrians, and has strong robustness in challenging scenes. Our method achieves competitive results in both accuracy and speed on Caltech pedestrian dataset, achieving a LAMR value of 36.75% and a runtime of 0.20 seconds per image. The validity of the algorithm has been proved.INDEX TERMS context analysis, Faster R-CNN, feature fusion, pedestrian detection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.