ESVS (Enhanced Synthetic Vision System) employs multi-modal sensor fusion to provide equivalent out-of-cabin view for pilots. By using such device, pilots can identify the runway and obstacles even in low visibility weather condition during approach and landing. Previously, short-wave infrared video was used as real-time image sensor in our developed ESVS. While, compared to visible image, infrared image is not easy to understand. Moreover, the image quality of infrared degrades to a considerable extent in haze weather condition. To improve the infrared image visibility in ESVS, we proposed a multi-modal sensor fusion strategy by combining real-time infrared video and previous recorded visible video in the clear weather condition. During image fusion, considerable color detail of the visible image is injected to infrared image to ameliorate the infrared image quality. Furthermore, the proposed fusion method has two extra advantages. First, the important tagged information including runway and obstacles in visible image are transferred to infrared video, and second, the missing information in infrared vision can be complemented. To evaluate our proposed method, we use the Y-12F aircraft equipped with both visible and infrared video camera for flight test data collection. By collecting visible and infrared video in clear and haze weather conditions, as well as necessary navigation data, we carry out the final image fusion processing. Experimental results show that the fused video frame has enhanced image quality and improved readability for the pilots, which will significantly promote the pilots’ situation awareness during approach and landing. Thus, the proposed idea for fusion of multi-sensor image has the great potential to be applied to other cockpit display systems.