In order to improve the detection performance of image fusion in focus areas and realize end-to-end decision diagram optimization, we design a multi-focus image fusion network based on deep learning. The network is trained using unsupervised learning and a multi-scale hybrid attention residual network model is introduced to enable solving for features at different levels of the image. In the training stage, multi-scale features are extracted from two source images with different focal points using hybrid multi-scale residual blocks (MSRB), and the up-down projection module (UDP) is introduced to obtain multi-scale edge information, then the extracted features are operated to obtain deeper image features. These blocks can effectively utilize multi-scale feature information without increasing the number of parameters. The deep features of the image are extracted in its test phase, input to the spatial frequency domain to calculate and measure the activity level and obtain the initial decision map, and use post-processing techniques to eliminate the edge errors. Finally, the decision map is generated and optimized, and the final fused image is obtained by combining the optimized decision map with the source image. The comparative experiments show that our proposed model achieves better fusion performance in subjective evaluation, and the quality of the obtained fused images is more robust with richer details. The objective evaluation metrics work better and the image fusion quality is higher.