Abstract-This paper presents a programmable, energyefficient and real-time object detection hardware accelerator for low power and high throughput applications using deformable parts models, with 2× higher detection accuracy than traditional rigid body models. Three methods are used to address the high computational complexity of 8 deformable parts detection: classification pruning for 33× fewer part classification, vector quantization for 15× memory size reduction, and feature basis projection for 2× reduction in the cost of each classification. The chip was fabricated in a 65nm CMOS technology, and can process full high definition 1920×1080 videos at 60fps without any off-chip storage. The chip has two programmable classification engines for multi-object detection. At 30fps, the chip consumes only 58.6mW (0.94 nJ/pixel, 1168 GOPS/W). At a higher throughput of 60fps, the classification engines can be time multiplexed to detect even more than two object classes. This proposed accelerator enables object detection to be as energyefficient as video compression, which is found in most cameras today.