A 58.6mW real-time programmable object detector with multi-scale multi-object support using deformable parts model on 1920×1080 video at 30fps

Suleiman, Amr; Zhang, Zhengdong; Sze, Vivienne

doi:10.1109/vlsic.2016.7573528

Cited by 13 publications

(19 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, the CONV layer weights from AlexNet can either be hard-wired in the multipliers, or stored in on-chip SRAM or ROM. This is not feasible if the available hardware resources are constrained to the level of the HOG design [7], i.e., 1000 kgates with 150 kB SRAM. Assuming each input and weight value take 1 byte, only 10k multipliers with fixed…”

Section: Closing the Energy Gapmentioning

confidence: 99%

“…In this paper, we will provide an in-depth analysis on the causes for the energy gap between hand-crafted and learned features. We use results from two actual chip designs: [7] implements the hand-crafted feature using HOG, and [8] implements the learned feature using CNN. Both chips use 65nm CMOS technology and have similar hardware resource utilization in terms of logic gate count and memory capacity.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Towards closing the energy gap between HOG and CNN features for embedded vision

Suleiman

Chen

Emer

et al. 2017

2017 IEEE International Symposium on Circuits and Systems (ISCAS)

Self Cite

View full text Add to dashboard Cite

Computer vision enables a wide range of applications in robotics/drones, self-driving cars, smart Internet of Things, and portable/wearable electronics. For many of these applications, local embedded processing is preferred due to privacy and/or latency concerns. Accordingly, energy-efficient embedded vision hardware delivering real-time and robust performance is crucial. While deep learning is gaining popularity in several computer vision algorithms, a significant energy consumption difference exists compared to traditional hand-crafted approaches. In this paper, we provide an in-depth analysis of the computation, energy and accuracy trade-offs between learned features such as deep Convolutional Neural Networks (CNN) and hand-crafted features such as Histogram of Oriented Gradients (HOG). This analysis is supported by measurements from two chips that implement these algorithms. Our goal is to understand the source of the energy discrepancy between the two approaches and to provide insight about the potential areas where CNNs can be improved and eventually approach the energy-efficiency of HOG while maintaining its outstanding performance accuracy.

show abstract

Section: Closing the Energy Gapmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Towards closing the energy gap between HOG and CNN features for embedded vision

Suleiman

Chen

Emer

et al. 2017

2017 IEEE International Symposium on Circuits and Systems (ISCAS)

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, this comes at the cost of 35× more computation compared to rigid object detection [10]. This overhead comes from four main factors: 3× larger model size with the parts filters, 4× larger image pyramid size to support parts classification at twice the image resolution relative to the root classification, 1.5× increase due to the deformation computation, and finally 2× increase due to the fact that two DPM models are used (original and flipped version).…”

Section: A Dpm Complexitymentioning

confidence: 99%

“…For multi-scale detection, the window scans an image pyramid (multiple downscaled versions of the image). Multi-scale detection increases the required computation as the image pyramid leads to data expansion, which can be a 100× increase in the number of pixels for a full HD image [10]. In classification, a pre-trained model that captures the characteristics of the target object is used at each sliding window position to label it as a true or a false object.…”

Section: Introductionmentioning

confidence: 99%

“…The deformable parts models (DPM) algorithm [20], which is based on HOG features, has demonstrated high detection accuracy. DPM doubles the detection accuracy compared to rigid object detection because of the relatively larger and more flexible models [10]. Lately, convolutional neural networks (CNN) have been widely used in large scale classification systems.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A 58.6 mW 30 Frames/s Real-Time Programmable Multiobject Detection Accelerator With Deformable Parts Models on Full HD $1920\times 1080$ Videos

Suleiman

Zhang

Sze

2017

IEEE J. Solid-State Circuits

Self Cite

View full text Add to dashboard Cite

Abstract-This paper presents a programmable, energyefficient and real-time object detection hardware accelerator for low power and high throughput applications using deformable parts models, with 2× higher detection accuracy than traditional rigid body models. Three methods are used to address the high computational complexity of 8 deformable parts detection: classification pruning for 33× fewer part classification, vector quantization for 15× memory size reduction, and feature basis projection for 2× reduction in the cost of each classification. The chip was fabricated in a 65nm CMOS technology, and can process full high definition 1920×1080 videos at 60fps without any off-chip storage. The chip has two programmable classification engines for multi-object detection. At 30fps, the chip consumes only 58.6mW (0.94 nJ/pixel, 1168 GOPS/W). At a higher throughput of 60fps, the classification engines can be time multiplexed to detect even more than two object classes. This proposed accelerator enables object detection to be as energyefficient as video compression, which is found in most cameras today.

show abstract

Pigeon cleaning behavior detection algorithm based on light-weight network

Guo

Deng

et al. 2022

Computers and Electronics in Agriculture

View full text Add to dashboard Cite

A 58.6mW real-time programmable object detector with multi-scale multi-object support using deformable parts model on 1920×1080 video at 30fps

Cited by 13 publications

References 5 publications

Towards closing the energy gap between HOG and CNN features for embedded vision

Towards closing the energy gap between HOG and CNN features for embedded vision

A 58.6 mW 30 Frames/s Real-Time Programmable Multiobject Detection Accelerator With Deformable Parts Models on Full HD $1920\times 1080$ Videos

Pigeon cleaning behavior detection algorithm based on light-weight network

Contact Info

Product

Resources

About