Template matching based on zero-mean normalized cross-correlation measure (ZNCC) has been widely used in a broad range of image processing applications. To meet the requirements for high processing speed, small size, and variable image size in automatic target recognition systems, a novel field-programmable gate array (FPGA)-based parallel architecture is presented in this paper for the ZNCC computation. The proposed architecture employs two groups of RAM blocks, one of which is used for the multiply-accumulate operations of the real and the reference images and the other for data rearrangement of the reference image, and their functions are switched through 2-input multiplexers when searching at the next row. Moreover, the sum of the pixels in the searching area of the real image is computed through serially accumulating the differences between the new column in the current searching area and the old column in the last searching area using one dual-port RAM. Simultaneously, the sum of the squares of the pixels is calculated in the same way. Using the Altera Stratix II FPGA chip (EP2S90F780I4) as the target device, the compilation results with Quartus II show that compared with the traditional architecture, the synthesis logic utilization decreases from 63% to 35% and the usage of DSP blocks decreases from 59% to 39%, while the memory bits only increase by 8% and the usage of other resources is nearly the same. The simulation and practical experimental results show that the proposed architecture can effectively improve the performance of the practical automatic target recognition system. INDEX TERMS FPGA, normalized cross-correlation measure, parallel architecture, template matching.