Convolutional neural networks (CNNs) have been widely used in the field of image recognition. To meet the massive computational requirements of CNNs, GPUs or other intelligent computing hardware are typically used for data processing. FPGA supports parallel computing and is characterized by programmability, high performance, low energy consumption, and strong stability. In this paper, we improved and optimized the YOLOv2-Tiny algorithm by combining it with the hardware implementation based on FPGA's hardware structure. We divided the neural network tasks and preprocessed data using the 16-bit fixed-point method to reduce hardware resource consumption. By using the PYNQ-z2 development platform to accelerate the YOLOv2-Tiny CNN, we achieved target object detection and recognition. Compared with CPU (i7-10710U), the processing capacity was 2.94 times that of CPU, and the power consumption was 3.1% of CPU.