Abstract. Hierarchical convolutional neural networks are a well-known robust image-recognition model. In order to apply this model to robot vision or various intelligent vision systems, its VLSI implementation with high performance and low power consumption is required. This paper proposes a convolutional network VLSI architecture using a hybrid approach composed of pulse-width modulation (PWM) and digital circuits. We call this approach merged/mixed analog-digital architecture. The VLSI includes PWM neuron circuits, PWM/digital converters, digital adder-subtracters, and digital memory. We have designed and fabricated a VLSI chip by using a 0.35 m CMOS process. The VLSI chip can perform 6-bit precision convolution calculations for an image of 100¢100 pixels with a receptive field area of up to 20¢20 pixels within 5 ms, which means a performance of 2 GOPS. Power consumption of PWM neuron circuits is estimated to be 20 mW. We have verified successful operations using a fabricated VLSI chip.