DetNet: Design Backbone for Object Detection

Li, Zeming; Peng, Chao; Yu, Gang; Zhang, Xiangyu; Deng, Yansha; Sun, Jian

doi:10.1007/978-3-030-01240-3_21

Cited by 362 publications

(242 citation statements)

References 44 publications

Supporting

Mentioning

241

Contrasting

Order By: Relevance

“…DetNet [36] uses 1×1 convolution projection instead of identity mapping although stages 4, 5, and 6 have the same spatial resolution. Our results ( Figure 5 Right) imply that the design keeps stages 4 and 5 away from the output layer, and avoids too sparse representation.…”

Section: Understanding Prior Work With Our Resultsmentioning

confidence: 99%

“…(iii) The strides of conv5 x are too coarse to localize objects. DetNet [36] and ScratchDet [91] also discuss this problem and change the strides for object detection. Unlike these works, our finding is that SGD (with other regularization methods) automatically limits the intrinsic dimensionalities of standard ResNet without changing the strides.…”

Section: Eigenspectrum Dynamicsmentioning

confidence: 99%

See 1 more Smart Citation

Understanding the Effects of Pre-Training for Object Detectors via Eigenspectrum

Shinya¹,

Simo-Serra

Suzuki

2019

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

View full text Add to dashboard Cite

ImageNet pre-training has been regarded as essential for training accurate object detectors for a long time. Recently, it has been shown that object detectors trained from randomly initialized weights can be on par with those finetuned from ImageNet pre-trained models. However, the effects of pre-training and the differences caused by pretraining are still not fully understood. In this paper, we analyze the eigenspectrum dynamics of the covariance matrix of each feature map in object detectors. Based on our analysis on ResNet-50, Faster R-CNN with FPN, and Mask R-CNN, we show that object detectors trained from Ima-geNet pre-trained models and those trained from scratch behave differently from each other even if both object detectors have similar accuracy. Furthermore, we propose a method for automatically determining the widths (the numbers of channels) of object detectors based on the eigenspectrum. We train Faster R-CNN with FPN from randomly initialized weights, and show that our method can reduce ∼27% of the parameters of ResNet-50 without increasing Multiply-Accumulate operations and losing accuracy. Our results indicate that we should develop more appropriate methods for transferring knowledge from image classification to object detection (or other tasks).

show abstract

Section: Understanding Prior Work With Our Resultsmentioning

confidence: 99%

Section: Eigenspectrum Dynamicsmentioning

confidence: 99%

Understanding the Effects of Pre-Training for Object Detectors via Eigenspectrum

Shinya¹,

Simo-Serra

Suzuki

2019

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

View full text Add to dashboard Cite

show abstract

“…In our study, we claim that the detection of small and occluded objects depends not only on detail features but also on semantic features and the contextual information [17]. Deep features have better expression towards the main characteristics of objects and more accurate semantic description of the objects in the scenes [13,15]. MDFN can effectively learn the deep features and yield compelling results on popular benchmark datasets.…”

Section: Introductionmentioning

confidence: 88%

“…According to [28], equation (2) performs well relying on the strong assumption that each feature map being fed into the final layer has to be sufficiently sophisticated to be helpful for detection and accurate localization of the objects. This is based on the following assumptions: 1) These feature maps should be able to provide the fine details especially for those from the earlier layers; 2) the function that transforms feature maps should be extended to the layers that are deep enough so that the high-level abstract information of the objects can be built into the feature maps; and 3) the feature maps should contain appropriate contextual information such that the occluded objects, small objects, blurred or overlapping ones can be inferred exactly and localized robustly [28,33,13].…”

Section: Deep Feature Extraction and Analysismentioning

confidence: 99%

MDFN: Multi-scale deep feature learning network for object detection

Cen

et al. 2020

Pattern Recognition

121

View full text Add to dashboard Cite

This paper proposes an innovative object detector by leveraging deep features learned in high-level layers. Compared with features produced in earlier layers, the deep features are better at expressing semantic and contextual information.The proposed deep feature learning scheme shifts the focus from concrete features with details to abstract ones with semantic information. It considers not only individual objects and local contexts but also their relationships by building a multi-scale deep feature learning network (MDFN). MDFN efficiently detects the objects by introducing information square and cubic inception modules into the high-level layers, which employs parameter-sharing to enhance the computational efficiency. MDFN provides a multi-scale object detector by integrating multi-box, multi-scale and multi-level technologies. Although MDFN employs a simple framework with a relatively small base network (VGG-16), it achieves better or competitive detection results than those with a macro hierarchical structure that is either very deep or very wide for stronger ability of feature extraction. The proposed technique is evaluated extensively on KITTI, PAS-CAL VOC, and COCO datasets, which achieves the best results on KITTI and leading performance on PASCAL VOC and COCO. This study reveals that deep features provide prominent semantic information and a variety of contex-tual contents, which contribute to its superior performance in detecting small or occluded objects. In addition, the MDFN model is computationally efficient, making a good trade-off between the accuracy and speed.Keywords: deep feature learning, multi-scale, semantic and contextual information, small and occluded objects.

show abstract

“…Furthermore, YOLOv2 [26] employs fully convolution network that results in m × n grids (m, n are the width and height of the output feature) and uses predefined anchors to better predict the bounding boxes of the objects. In [16], Li et al propose a backbone network to improve the accuracy by maintaining high resolution for feature maps and reduce the computation complexity by decreasing the width of upper layers.…”

Section: Face Detectionmentioning

confidence: 99%

DupNet: Towards Very Tiny Quantized CNN With Improved Accuracy for Face Detection

Gao

Tao

Wen

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

View full text Add to dashboard Cite

Deploying deep learning based face detectors on edge devices is a challenging task due to the limited computation resources. Even though binarizing the weights of a very tiny network gives impressive compactness on model size (e.g. 240.9 KB for IFQ-Tinier-YOLO), it is not tiny enough to fit in the embedded devices with strict memory constraints. In this paper, we propose DupNet which consists of two parts. Firstly, we employ weights with duplicated channels for the weight-intensive layers to reduce the model size. Secondly, for the quantization-sensitive layers whose quantization causes notable accuracy drop, we duplicate its input feature maps. It allows us to use more weights channels for convolving more representative outputs. Based on that, we propose a very tiny face detector, DupNet-Tinier-YOLO, which is 6.5× times smaller on model size and 42.0% less complex on computation and meanwhile achieves 2.4% higher detection than IFQ-Tinier-YOLO. Comparing with the full precision Tiny-YOLO, our DupNet-Tinier-YOLO gives 1,694.2× and 389.9× times savings on model size and computation complexity respectively with only 4.0% drop on detection rate (0.880 vs. 0.920). Moreover, our DupNet-Tinier-YOLO is only 36.9 KB, which is the tiniest deep face detector to our best knowledge.

show abstract

DetNet: Design Backbone for Object Detection

Cited by 362 publications

References 44 publications

Understanding the Effects of Pre-Training for Object Detectors via Eigenspectrum

Understanding the Effects of Pre-Training for Object Detectors via Eigenspectrum

MDFN: Multi-scale deep feature learning network for object detection

DupNet: Towards Very Tiny Quantized CNN With Improved Accuracy for Face Detection

Contact Info

Product

Resources

About