Improving the Performance of Infrared and Visible Image Fusion Based on Latent Low-Rank Representation Nested With Rolling Guided Image Filtering

Gao, Ce; Song, Chao; Zhang, Yanchao; Qi, Donghao; Yu, Yi

doi:10.1109/access.2021.3090436

Cited by 16 publications

(5 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Over the past decade, extensive research in image fusion has yielded numerous methods, broadly categorized into traditional and deep learning-based approaches. Traditional methods, like multi-scale transform (MST) [9][10][11], sparse representation (SR) [12,13], low-rank representation [14][15][16], and saliency-based approaches [17,18], employ various techniques for fusion. However, they suffer from drawbacks such as operator dependency and computational intensity.…”

Section: Introductionmentioning

confidence: 99%

HATF: Multi-Modal Feature Learning for Infrared and Visible Image Fusion via Hybrid Attention Transformer

Liu,

Wang,

Gao

et al. 2024

Remote Sensing

View full text Add to dashboard Cite

Current CNN-based methods for infrared and visible image fusion are limited by the low discrimination of extracted structural features, the adoption of uniform loss functions, and the lack of inter-modal feature interaction, which make it difficult to obtain optimal fusion results. To alleviate the above problems, a framework for multimodal feature learning fusion using a cross-attention Transformer is proposed. To extract rich structural features at different scales, residual U-Nets with mixed receptive fields are adopted to capture salient object information at various granularities. Then, a hybrid attention fusion strategy is employed to integrate the complementing information from the input images. Finally, adaptive loss functions are designed to achieve optimal fusion results for different modal features. The fusion framework proposed in this study is thoroughly evaluated using the TNO, FLIR, and LLVIP datasets, encompassing diverse scenes and varying illumination conditions. In the comparative experiments, HATF achieved competitive results on three datasets, with EN, SD, MI, and SSIM metrics reaching the best performance on the TNO dataset, surpassing the second-best method by 2.3%, 18.8%, 4.2%, and 2.2%, respectively. These results validate the effectiveness of the proposed method in terms of both robustness and image fusion quality compared to several popular methods.

show abstract

Section: Introductionmentioning

confidence: 99%

HATF: Multi-Modal Feature Learning for Infrared and Visible Image Fusion via Hybrid Attention Transformer

Liu,

Wang,

Gao

et al. 2024

Remote Sensing

View full text Add to dashboard Cite

show abstract

“…Finally, the image is reconstructed using the inverse transformation of feature extraction. In addition, the methods of non-deep learning also include sparse representation (SR) [12][13][14][15]-based methods, subspace [16,17]-based methods, and low-rank representation (LRR) [18][19][20][21]-based methods. Although the non-deep learning methods can synthesize satisfactory results, they still have some drawbacks: (1) manually designed fusion strategies cannot adapt to complex image fusion conditions and have poor generalization ability; (2) manual feature extraction has limitations in comprehensively capturing multi-modal images, which introduces noise and causes image distortion.…”

Section: Introductionmentioning

confidence: 99%

SFPFusion: An Improved Vision Transformer Combining Super Feature Attention and Wavelet-Guided Pooling for Infrared and Visible Images Fusion

Li,

Xiao,

Cheng

et al. 2023

Sensors

View full text Add to dashboard Cite

The infrared and visible image fusion task aims to generate a single image that preserves complementary features and reduces redundant information from different modalities. Although convolutional neural networks (CNNs) can effectively extract local features and obtain better fusion performance, the size of the receptive field limits its feature extraction ability. Thus, the Transformer architecture has gradually become mainstream to extract global features. However, current Transformer-based fusion methods ignore the enhancement of details, which is important to image fusion tasks and other downstream vision tasks. To this end, a new super feature attention mechanism and the wavelet-guided pooling operation are applied to the fusion network to form a novel fusion network, termed SFPFusion. Specifically, super feature attention is able to establish long-range dependencies of images and to fully extract global features. The extracted global features are processed by wavelet-guided pooling to fully extract multi-scale base information and to enhance the detail features. With the powerful representation ability, only simple fusion strategies are utilized to achieve better fusion performance. The superiority of our method compared with other state-of-the-art methods is demonstrated in qualitative and quantitative experiments on multiple image fusion benchmarks.

show abstract

“…In the past decades, traditional methods have been proposed for the fusion of pixel-level or fixed features. Traditional image fusion methods mainly include multi-scale transform (MST) [13,14], sparse representation (SR) [15,16], salience [17,18] and low rank representation (LRR) [19,20]. The MST methods design appropriate fusion strategies to fuse the sub-layers obtained by using some transform operators, and the result is achieved through the inverse transformation.…”

Section: Introductionmentioning

confidence: 99%

MFST: Multi-Modal Feature Self-Adaptive Transformer for Infrared and Visible Image Fusion

et al. 2022

View full text Add to dashboard Cite

Infrared and visible image fusion is to combine the information of thermal radiation and detailed texture from the two images into one informative fused image. Recently, deep learning methods have been widely applied in this task; however, those methods usually fuse multiple extracted features with the same fusion strategy, which ignores the differences in the representation of these features, resulting in the loss of information in the fusion process. To address this issue, we propose a novel method named multi-modal feature self-adaptive transformer (MFST) to preserve more significant information about the source images. Firstly, multi-modal features are extracted from the input images by a convolutional neural network (CNN). Then, these features are fused by the focal transformer blocks that can be trained through an adaptive fusion strategy according to the characteristics of different features. Finally, the fused features and saliency information of the infrared image are considered to obtain the fused image. The proposed fusion framework is evaluated on TNO, LLVIP, and FLIR datasets with various scenes. Experimental results demonstrate that our method outperforms several state-of-the-art methods in terms of subjective and objective evaluation.

show abstract

Improving the Performance of Infrared and Visible Image Fusion Based on Latent Low-Rank Representation Nested With Rolling Guided Image Filtering

Cited by 16 publications

References 44 publications

HATF: Multi-Modal Feature Learning for Infrared and Visible Image Fusion via Hybrid Attention Transformer

HATF: Multi-Modal Feature Learning for Infrared and Visible Image Fusion via Hybrid Attention Transformer

SFPFusion: An Improved Vision Transformer Combining Super Feature Attention and Wavelet-Guided Pooling for Infrared and Visible Images Fusion

MFST: Multi-Modal Feature Self-Adaptive Transformer for Infrared and Visible Image Fusion

Contact Info

Product

Resources

About