In this paper, we present a class of emergent algorithms called Marching Pixels and a corresponding programmable parallel chip architecture. Marching Pixels can be used for real-time image processing in smart camera chips. They are based on hardware agents, which are virtually crawling in a pixel grid image to find attributes like centroid, rotation, and size of an arbitrary number of objects given in an image. Because of the distributed and local processing scheme of Marching Pixels, reply times in milliseconds can be fulfilled. This means that time is determined where pre-known objects are located and how they are oriented to the main axes of the image. We present an example Marching Pixels algorithm and corresponding applicationspecific and programmable parallel architectures. The latter contains a specific instruction set that allows not only the execution of Marching Pixels algorithms but also of arbitrary Cellular Automata algorithms as an embedded parallel processor. The strengths and weaknesses of this architecture concerning the realization as field-programmable gate arrays and application-specific integrated circuits are discussed by means of hardware synthesis results. These results are compared with the solution achievable on a real hardware like the Atom processor. the camera in a way that work can be done in the requested time. The problem is, if image processing is serially performed using classic algorithms, it will be too slow even on fast single processors. For clarification, an example is given now. Let the image have classic VGA resolution of 640 480 pixels. This means a serial processor has to compute 307.200 pixels within the required reply time of 10 ms. Hence, only 32.6 ns remain for the computation of each pixel. To carry out a reasonable number of operations within this time range would require a clock cycle time of several GHz that one would like to avoid in embedded systems because of the high energy dissipation at such high frequencies.Our answer in order to both meet the strict real-time requirements and to be scalable with regard to increasing pixel resolutions are appropriate parallel low-level, that is hardware-oriented, algorithms, which are based mainly on local operators. These algorithms must not only work in parallel to be fast but also in a distributed way to be robust as well.The scalability in our algorithms is fulfilled by a kind of autonomous agents, which are instructed with the task to travel virtually within a pixel grid, which corresponds to the image, in order to find the centroids coordinates of objects given in this image. After the agents have found them, the corresponding coordinates are given to a robot control to enable the robot to grap the objects. Because these agents march around the pixels, they were given the name Marching Pixels (MPs). They have the goal to visit certain pixels and to gather data about the objects to which the pixels belong. The MP swarm shall further compress the gathered data in order to retrieve desired object information like size, ...