In recent years, convolutional neural networks (CNNs) have gained widespread adoption in remote sensing image processing. Deploying CNN-based algorithms on satellite edge devices can alleviate the strain on data downlinks. However, CNN algorithms present challenges due to their large parameter count and high computational requirements, which conflict with the satellite platforms’ low power consumption and high real-time requirements. Moreover, remote sensing image processing tasks are diverse, requiring the platform to accommodate various network structures. To address these issues, this paper proposes an algorithm–hardware co-optimization and deployment method for FPGA-based CNN remote sensing image processing. Firstly, a series of hardware-centric model optimization techniques are proposed, including operator fusion and depth-first mapping technology, to minimize the resource overhead of CNN models. Furthermore, a versatile hardware accelerator is proposed to accelerate a wide range of commonly used CNN models after optimization. The accelerator architecture mainly consists of a parallel configurable network processing unit and a multi-level storage structure, enabling the processing of optimized networks with high throughput and low power consumption. To verify the superiority of our method, the introduced accelerator was deployed on an AMD-Xilinx VC709 evaluation board, on which the improved YOLOv2, VGG-16, and ResNet-34 networks were deployed. Experiments show that the power consumption of the accelerator is 14.97 W, and the throughput of the three networks reaches 386.74 giga operations per second (GOPS), 344.44 GOPS, and 182.34 GOPS, respectively. Comparison with related work demonstrates that the co-optimization and deployment method can accelerate remote sensing image processing CNN models and is suitable for applications in satellite edge devices.