As DenseNet conserves intermediate features with diverse receptive fields by aggregating them with dense connection, it shows good performance on the object detection task. Although feature reuse enables DenseNet to produce strong features with a small number of model parameters and FLOPs, the detector with DenseNet backbone shows rather slow speed and low energy efficiency. We find the linearly increasing input channel by dense connection leads to heavy memory access cost, which causes computation overhead and more energy consumption. To solve the inefficiency of DenseNet, we propose an energy and computation efficient architecture called VoVNet comprised of One-Shot Aggregation (OSA). The OSA not only adopts the strength of DenseNet that represents diversified features with multi receptive fields but also overcomes the inefficiency of dense connection by aggregating all features only once in the last feature maps. To validate the effectiveness of VoVNet as a backbone network, we design both lightweight and largescale VoVNet and apply them to one-stage and two-stage object detectors. Our VoVNet based detectors outperform DenseNet based ones with 2× faster speed and the energy consumptions are reduced by 1.6× -4.1×. In addition to DenseNet, VoVNet also outperforms widely used ResNet backbone with faster speed and better energy efficiency. In particular, the small object detection performance has been significantly improved over DenseNet and ResNet. Inception-V4 [24], ResNet [7], and DenseNet [9], it has become mainstream in object detector to adopt the modern state-of-the-art CNN models as feature extractor. As DenseNet is reported to achieve state-of-the-art performance in the classification task recently, it is natural to attempt to expand its usage to detection tasks. In our experiment (Table 4), we find that the DenseNet based detectors with fewer parameters and FLOPs outperform the detectors with ResNet, which is most widely used for the backbone of object detections. The main difference between ResNet and DenseNet is the way they aggregate their features; ResNet aggregates the features from shallower by summation while DenseNet does it by concatenation. As mentioned by Zhu et al. [32], arXiv:1904.09730v1 [cs.CV]
Since many safety-critical systems such as surgical robots and autonomous driving cars are in unstable environments with sensor noise or incomplete data, it is desirable for object detectors to take the confidence of the localization prediction into account. Recent attempts to estimate localization uncertainty for object detection focus only anchor-based method that captures the uncertainty of different characteristics such as location (center point) and scale (width, height). Also, anchor-based methods need to adjust sensitive anchor-box settings. Therefore, we propose a new object detector called Gaussian-FCOS that estimates the localization uncertainty based on an anchor-free detector that captures the uncertainty of similar property with four directions of box offsets (left, right, top, bottom) and avoids the anchor tuning. For this purpose, we design a new loss function, uncertainty loss, to measure how uncertain the estimated object location is by modeling the uncertainty as a Gaussian distribution. Then, the detection score is calibrated through the estimated uncertainty. Experiments on challenging COCO datasets demonstrate that the proposed new loss function not only enables the network to estimate the uncertainty but produces a synergy effect with regression loss. In addition, our Gaussian-FCOS reduces false positives with the estimated localization uncertainty and finds more missing-objects, boosting both Average Precision (AP) and Recall (AR). We hope Gaussian-FCOS serve as a baseline for the reliability-required task.Preprint. Under review.
The TVA metadata allows consumers to find, navigate, and manage contents through a variety of terminal devices including PDR, DTV, and IPTV. In general, a TVA metadata description is delivered from a content provider to a terminal device over a transport link. Because such metadata description can become very large, it is essential to encode each fragment of the metadata description in a compressed format before the delivery and then decode the encoded fragments at the terminal device. This paper proposes a new encoding procedure for TVA metadata based on the EXI and presents performance comparison results between the proposed encoding procedure and the existing TVA encoding procedure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.