Complementary metal-oxide-semiconductor (CMOS) image sensors are a visual outpost of many machines that interact with the world. While they presently separate image capture in front-end silicon photodiode arrays from image processing in digital back-ends, efforts to process images within the photodiode array itself are rapidly emerging, in hopes of minimizing the data transfer between sensing and computing, and the associated overhead in energy and bandwidth. Electrical modulation, or programming, of photocurrents is requisite for such in-sensor computing, which was indeed demonstrated with electrostatically doped, but non-silicon, photodiodes. CMOS image sensors are currently incapable of in-sensor computing, as their chemically doped photodiodes cannot produce electrically tunable photocurrents. Here we report in-sensor computing with an array of electrostatically doped silicon p-i-n photodiodes, which is amenable to seamless integration with the rest of the CMOS image sensor electronics. This silicon-based approach could more rapidly bring in-sensor computing to the real world due to its compatibility with the mainstream CMOS electronics industry. Our wafer-scale production of thousands of silicon photodiodes using standard fabrication emphasizes this compatibility. We then demonstrate in-sensor processing of optical images using a variety of convolutional lters electrically programmed into a 3 × 3 network of these photodiodes.
Main TextComplementary metal oxide semiconductor (CMOS) image sensors have become an indispensable part of our data-driven world, where visual information prevails 1,2 . The front-end silicon photodiode array in a CMOS image sensor converts light into electrical currents. These electrical data undergo analog-to-digital conversion and are then shuttled to a digital back-end for image processing. While this standard sequence of front-end image capture and back-end processing restricts the role of the photodiode array to sensing, emerging machine vision applications would bene t from data processing within the photodiode array itself 3,4 . For example, in object tracking for self-driving vehicles, drones, or robots, where only the edges of objects are relevant [5][6][7][8] , edge extraction in the front-end photodiode array would be much more economical in energy expenditure, processing latency, required bandwidth, and memory usage, as compared to transferring the whole image data containing super uous information to the back-end digital processor-only to extract the edges 9 . Such in-sensor computing would require an electrical modulation, or programming, of photocurrents. In fact, in-sensor computing has been recently demonstrated with electrostatically doped photodiodes whose photocurrents can be modulated with gate biasing 10,11 . These pioneering works have realized electrostatically doped photodiodes by gating two-dimensional (2D) transition metal dichalcogenide (TMD) layers or their van der Waals (vdW) stacks [12][13][14] . In contrast, such in-sensor computing is not possible with ...