In this paper, a global feature extraction very large scale integration (VLSI) architecture for real-time object recognition is presented. To minimize the latency between capturing images and final recognition, the image sensor and feature extraction circuits are integrated on the same chip. The digital pixel sensor (DPS) configuration has been used because of its intrinsic compatibility with digital processing circuits. A block-readout architecture developed for DPS has been adopted for the massively parallel processing of image data. As a result, the directional edge filtering at each pixel site for local feature extraction is performed in a line-parallel manner. To eliminate trivial local features, the rank-order filter algorithm has been adapted for the processing and implemented in an efficient global feature extraction circuitry that can retain any given number of relatively more significant features in an image as essential features. This is accomplished in only 11 cycles. A proof-of-concept chip was designed in a 0.18 mm five-metal complementary metal-oxide-semiconductor (CMOS) technology, and the function of this VLSI processor was verified by circuit simulation.