Abstract-A heterogeneous many-core object recognition processor is proposed to realize robust and efficient object recognition on real-time video of cluttered scenes. Unlike previous approaches that simply aimed for high GOPS/W, we aim to achieve high Effective GOPS/W, or EGOPS/W, which only counts operations carried out on meaningful regions of an input image. This is achieved by the Unified Visual Attention Model (UVAM) which confines complex Scale Invariant Feature Transform (SIFT) feature extraction to meaningful object regions while rejecting meaningless background regions. The Intelligent Inference Engine (IIE), a mixed-mode neuro-fuzzy inference system, performs the top-down familiarity attention of the UVAM which guides attention toward pre-learned objects. Weight perturbation-based learning of the IIE ensures high attention precision through online adaptation. The SIFT recognition is accelerated by an optimized array of 4 20-way SIMD Vector Processing Elements, 32 MIMD Scalar Processing Elements, and 1 Feature Matching Processor. When processing 30 fps 640 480 video, the 50 mm 2 object recognition processor implemented in a 0.13 m process achieves 246 EGOPS/W, which is 46% higher than the previous work. The average power consumption is only 345 mW.