2022
DOI: 10.48550/arxiv.2204.11436
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SwinFuse: A Residual Swin Transformer Fusion Network for Infrared and Visible Images

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(16 citation statements)
references
References 41 publications
0
16
0
Order By: Relevance
“…Fusion finds its utility in a myriad of applications including High Dynamic Range (HDR) imaging, color transfer, and infrared-visible fusion [34]. The literature delineates three primary approaches to image fusion [35]: Low Level Fusion (LLF) or early fusion [36,37], Mid-Level Fusion (MLF) [38,39], and High Level Fusion (HLF) or late fusion [40]. These approaches are distinguished by the stage at which fusion occurs prior to input, during feature extraction, or post feature extraction.…”
Section: Image Fusionmentioning
confidence: 99%
“…Fusion finds its utility in a myriad of applications including High Dynamic Range (HDR) imaging, color transfer, and infrared-visible fusion [34]. The literature delineates three primary approaches to image fusion [35]: Low Level Fusion (LLF) or early fusion [36,37], Mid-Level Fusion (MLF) [38,39], and High Level Fusion (HLF) or late fusion [40]. These approaches are distinguished by the stage at which fusion occurs prior to input, during feature extraction, or post feature extraction.…”
Section: Image Fusionmentioning
confidence: 99%
“…Li et al [25] combined CNN with transformer to extract the local features by CNN and capture long-range dependencies by transformer. Moreover, Wang et al built a pure transformer network to extract the long-range dependency of images, and they designed a L1-norm based strategy to measure and preserve infrared saliency and visible texture information [27]. Ma et al [2] also proposed a pure transformer based fusion model (SwinFusion), which utilizes the cross-domain global learning to implement intra-and inter-domain fusion based on self-attention and cross-attention, and they introduced Swin transformer to extract long-range dependency of images.…”
Section: B Transformer Based Fusion Methodsmentioning
confidence: 99%
“…Li et al [25] and Vibashan et al [26] combined the transformer with CNNs to extract image's local features and long-range dependencies. In addition, Ma et al [2] and Li et al [27] introduced Swin-transformer to infrared and visible image fusion tasks.…”
Section: Introductionmentioning
confidence: 99%
“…To overcome this shortcoming, the transformer is applied into IVF tasks. Since these transformer-based methods integrate the transformer and the CNN [29], [30], the methods can simultaneously extract both the local features and the long-range dependencies [31], [32].…”
Section: A Vision-perception Oriented Ivfmentioning
confidence: 99%
“…Specifically, we compare the proposed method with several high-level vision task-driven methods (i.e., the PSFusion [36], the SegMiF [37], the TarDAL [38]), which include either the semantic segmentation task-driven or the object detection task-driven methods. Finally, we compare the proposed method with several vision-perception oriented methods (i.e., the CBF [15], the DDcGAN [26], the MetaFusion [41], the SwinFuse [32]), which include the traditional methods, CNN-based methods, meta learning-based methods and Transformer-based methods.…”
Section: A Experimental Configurationsmentioning
confidence: 99%