This paper presents a programmable, energy-efficient and realtime object detection accelerator using deformable parts models (DPM), with 2x higher accuracy than traditional rigid body models. With 8 deformable parts detection, three methods are used to address the high computational complexity: classification pruning for 33x fewer parts classification, vector quantization for 15x memory size reduction, and feature basis projection for 2x reduction of the cost of each classification. The chip is implemented in 65nm CMOS technology, and can process HD (1920x1080) images at 30fps without any off-chip storage while consuming only 58.6mW (0.94nJ/pixel, 1168 GOPS/W). The chip has two classification engines to simultaneously detect two different classes of objects. With a tested high throughput of 60fps, the classification engines can be time multiplexed to detect even more than two object classes. It is energy scalable by changing the pruning factor or disabling the parts classification. Keywords: DPM, object detection, basis projection, pruning. Introduction Object detection is critical to many embedded applications that require low power and real-time processing. For example, low latency and HD images are important for autonomous control to react quickly to fast approaching objects, while low energy consumption is essential due to battery and heat limitations. Object detection involves not only classification/recognition, but also localization, which is achieved by sliding a window of a pretrained model over an image. For multi-scale detection, the window slides over an image pyramid (multiple downscaled copies of the image). Multi-scale detection is very challenging as the image pyramid results in a data expansion, which can be more than a 100x in HD images. The high computational complexity of object detection processing necessitates fast hardware implementations [1] to enable real-time processing.This paper presents a complete object detection accelerator using DPM [2] with a root and 8 parts model as shown in Fig. 1. DPM results in double the detection accuracy compared to rigid template (root only) detection. The 8 parts account for deformation such that a single model can detect objects at different poses ( Fig. 6) and increase detection confidence. However, this accuracy comes with a classification overhead of 35x more multiplications (i.e. DPM classification consumes 80% of a single detector power), making multi-object detection a challenge. A software-based DPM object detector is described in [3], which enables detection for 500x500 images at 30fps but requires a powerful fully loaded Xeon 6-core processor and 32GB of memory. In this work, the classification overhead is significantly reduced by two main techniques: Classification pruning with vector quantization (VQ) for selective part processing. Feature basis projection for sparse multiplications.Architecture Overview Fig. 2 shows the block diagram of our detector architecture, including histogram of oriented gradients (HOG) feature pyramid gen...