Due to increasing speed and capabilities of production machines, the need for extremely fast and robust observation, classification, and error handling is vital to industrial image processing. We present an emergent algorithmic computing scheme and a corresponding embedded massively-parallel hardware architecture for these problems. They offer the potential to turn CMOS-camera-chips into intelligent vision devices which carry out tasks without help of a central processor, only based on local interaction of agents crawling on a large field of processing elements. It also constitutes a breakthrough for understanding sensor devices as a decentralized concept, resulting in much faster computation evading communication bottlenecks of classic approaches that become an ever-growing impediment to scalability. Here, in contrast, the number of agents and the field size and thus the computable image resolution is extremely scalable and therefore promises even more benefit with future hardware development. The results are based on novel algorithmic solutions allowing processor elements to compute center points, moments, and orientation of multiple image objects in parallel, which is of central importance to e.g. robotics. We finally present the algorithm's capabilities if realized in state-ofthe-art FPGAs and ASICs.