“…There are more current infrared and visible image fusion methods, but they are mainly categorized into two groups: traditional methods and deep learning (DL)-based methods. Traditional fusion methods are usually based on fusion in the spatial and transform domains [ 5 ], and the image fusion frameworks used mainly include multi-scale transform (MST)-based fusion frameworks [ 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 ], sparse representation (SR)-based fusion frameworks [ 16 , 17 , 18 ], subspace-based fusion frameworks [ 19 , 20 , 21 ], saliency-based fusion frameworks [ 22 ], and hybrid fusion frameworks [ 23 , 24 , 25 ]. And according to the adopted network architecture, the DL-based image fusion methods can be mainly categorized into three groups, which are autoencoders (AE)-based image fusion frameworks [ 26 , 27 , 28 , 29 , 30 ], convolutional neural network (CNN)-based image fusion frameworks [ 31 , 32 , 33 , 34 , 35 , 36 , 37 ] and generative adversarial network-(GAN) based image fusion frameworks [ 38 , 39 , 40 , 41 , 42 , 43 ].…”