In this work, an energy-quality (EQ) scalable and memory-frugal architecture for video feature extraction is introduced to reduce circuit complexity, power and silicon area. Leveraging on the inherent resiliency of vision against noise and inaccuracies, the proposed approach introduces properly selected EQ tuning knobs to reduce the energy of feature extraction at graceful quality degradation. As opposed to prior art, the proposed architecture enables the adjustment of such knobs, and adapts its cycle-level timing to reduce the amount of computation per frame at lower quality targets. As further benefit, the approach adds opportunities for energy reduction via aggressive voltage scaling. The proposed architecture mitigates the traditionally dominant area/energy of the on-chip memory by reducing the number of pixels stored on chip, introducing memory access reuse and on-the-fly computation. At the same time, EQ tuning preserves the ability to conventionally operate at maximum quality, when required by the task or the visual context. A 0.55 mm 2 testchip in 40nm exhibits power down to 82µW at 5fps frame rate (i.e., 33X lower than prior art), while assuring successful object detection at VGA resolution. To the best of the authors' knowledge, this is the first feature extractor with sub-mW operation and sub-mm 2 area, making the proposed approach well suited for tightly power-constrained and low-cost distributed vision systems (e.g., video sensor nodes).