This paper proposes a computation-array-centered dataflow, which adjusts the convolution with different kernel sizes to a unified computing manner and reduces the dimension of computation array from 2D to 1D, so as to maximize the utilization of the computation elements offered by the accelerator. Furthermore, a single unit multiple data (SUMD) strategy is proposed to effectively alleviate the mismatch between the quantized data and the hardware resources with fixed bit width on FPGA. As a case study, an 8-bit MobileNetV2 model has been implemented on the low-cost ZYNQ XC7Z020 FPGA, whose FPS/DSP and GOPS/DSP achieve upto 0.55 and 0.35 respectively.