With the continuous development of imaging sensors, images contain more and more information, the images presented by different types of sensors are different, and the images obtained by the same type of sensors under different parameters or conditions are also different. Multisource image fusion technology combines images acquired by different types of sensors or the same type of sensors with different parameter settings, which makes the image information more complete, compensates for the limitations of images of the same type, and also allows you to save information about the characteristics of the original image. Multimodal image mosaic and multifocal image mosaic have been studied in detail in two directions. On the one hand, a method based on frequency domain transformation is used for multiscale image decomposition. On the other hand, image extraction with neural network-based methods is proposed. The technology of convolutional neural networks (CNNs) allows to extract richer texture features. However, when using this method for fusion, it is difficult to obtain an accurate decision map, and there are artifacts in the fusion boundary. Based on this, a multifocal fusion method based on a two-stage CNN is proposed. Train the advanced intensive network to classify input image blocks as focus, and then use the appropriate merge rules to get the ideal decision tree. In addition, several versions of the fuzzy learning set have been developed to improve network performance. Experimental results show that the frames of the first stage proposed by the algorithm make it possible to obtain an accurate decision scheme and that the frames of the second stage make it possible to eliminate the pseudo-shadow of the integration boundary.