In order to improve the visual reconnaissance ability of unmanned aerial vehicles (UAVs), a method combaning three cameras, namely "Wide-Field", "Long-Focus", and "Infrared" cameras, is proposed, in which the wide-field camera simulates the peripheral vision of the human eye, the long-focus camera simulates the human eye gaze, and the infrared camera ensures the robustness of the system in the degraded visual environment (DVE). First, the target's location is preliminarily determined according to the visual salient information of the infrared image; then, based on the multi-scale image fusion method, the infrared/visible fusion images of the target area are obtained; finally, YOLOv5 and improved SORT algorithm are utilized to complete target detecting and tracking. To verify the effectiveness of the proposed method, the images are rendered in the Unity3D engine, and the simulation experiments of UAV reconnaissance are carried out. Results show that, the performance of image fusion algorithm proposed in this paper is close to the typical methods like ADF and VSMWLS in the sence of the evaluation indicators such as CE, AG, SD, FPS. However, the proposed method only costs about 1/5 of the calculation time.