2020
DOI: 10.3390/computers9040098
|View full text |Cite
|
Sign up to set email alerts
|

FuseVis: Interpreting Neural Networks for Image Fusion Using Per-Pixel Saliency Visualization

Abstract: Image fusion helps in merging two or more images to construct a more informative single fused image. Recently, unsupervised learning-based convolutional neural networks (CNN) have been used for different types of image-fusion tasks such as medical image fusion, infrared-visible image fusion for autonomous driving as well as multi-focus and multi-exposure image fusion for satellite imagery. However, it is challenging to analyze the reliability of these CNNs for the image-fusion tasks since no groundtruth is ava… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 72 publications
0
7
0
Order By: Relevance
“…These global representations are selected from the local features obtained by the attention mechanism. Although existing audiovisual fusion models are capable of obtaining effective joint representations, they are more complex and difficult to explain [5]. As described in [4], selecting key local components from the global representation is beneficial for reducing the complexity of the model.…”
Section: Audiovisual Information Fusionmentioning
confidence: 99%
See 1 more Smart Citation
“…These global representations are selected from the local features obtained by the attention mechanism. Although existing audiovisual fusion models are capable of obtaining effective joint representations, they are more complex and difficult to explain [5]. As described in [4], selecting key local components from the global representation is beneficial for reducing the complexity of the model.…”
Section: Audiovisual Information Fusionmentioning
confidence: 99%
“…Specifically, unimodal representations can only describe changes in emotion from a single perspective [4]. Therefore, compared with modeling unimodal information, previous research has focused on the use of specific deep neural networks (DNNs) to efficiently learn the joint representation of multiple modalities [5]. For instance, a large number of studies seek to tackle these challenges by building complex network structures [6] and fusing multimodal feature matrices [7], which can mine deep multimodal features and enhance interaction between audiovisual signals, respectively.…”
Section: Introductionmentioning
confidence: 99%
“…Xiaoning Zhang's paper proposes a new attention-guided network model that selectively integrates multilevel contextual information in an incremental manner. In addition to simulating the human attention mechanism, there is some research work that analyzes the importance of the information around the face object in judging the position of the face [29][30][31][32][33].…”
Section: Related Workmentioning
confidence: 99%
“…The saliency-based algorithm can make up for this deficiency well, and it can be combined with the above three categories of methods to improve the quality of the result images. 42 Ma et al defined pixel-level saliency to fuse base layers. 43 Zhang et al improved it by squaring the intensity difference to alleviate the problem of poor perception of lesions and edges.…”
Section: Introductionmentioning
confidence: 99%
“…The reason is that the characteristics of the image content itself and the sensitivity of the human eye to information, such as high brightness and contrast changes, are not better utilized. The saliency‐based algorithm can make up for this deficiency well, and it can be combined with the above three categories of methods to improve the quality of the result images 42 . Ma et al defined pixel‐level saliency to fuse base layers 43 .…”
Section: Introductionmentioning
confidence: 99%