In recent years, convolutional neural network (CNN)-based methods have been widely used for optical remote sensing object detection and have shown excellent performance. Some aerospace systems, such as satellites or aircrafts, need to adopt these methods to observe objects on the ground. Due to the limited budget of the logical resources and power consumption in these systems, an embedded device is a good choice to implement the CNN-based methods. However, it is still a challenge to strike a balance between performance and power consumption. In this paper, we propose an efficient hardware-implementation method for optical remote sensing object detection. Firstly, we optimize the CNN-based model for hardware implementation, which establishes a foundation for efficiently mapping the network on a field-programmable gate array (FPGA). In addition, we propose a hardware architecture for the CNN-based remote sensing object detection model. In this architecture, a general processing engine (PE) is proposed to implement multiple types of convolutions in the network using the uniform module. An efficient data storage and access scheme is also proposed, and it achieves low-latency calculations and a high memory bandwidth utilization rate. Finally, we deployed the improved YOLOv2 network on a Xilinx ZYNQ xc7z035 FPGA to evaluate the performance of our design. The experimental results show that the performance of our implementation on an FPGA is only 0.18% lower than that on a graphics processing unit (GPU) in mean average precision (mAP). Under a 200 MHz working frequency, our design achieves a throughput of 111.5 giga-operations per second (GOP/s) with a 5.96 W on-chip power consumption. Comparison with the related works demonstrates that the proposed design has obvious advantages in terms of energy efficiency and that it is suitable for deployment on embedded devices.