2021
DOI: 10.3390/rs13183724
|View full text |Cite
|
Sign up to set email alerts
|

MSNet: A Multi-Stream Fusion Network for Remote Sensing Spatiotemporal Fusion Based on Transformer and Convolution

Abstract: Remote sensing products with high temporal and spatial resolution can be hardly obtained under the constrains of existing technology and cost. Therefore, the spatiotemporal fusion of remote sensing images has attracted considerable attention. Spatiotemporal fusion algorithms based on deep learning have gradually developed, but they also face some problems. For example, the amount of data affects the model’s ability to learn, and the robustness of the model is not high. The features extracted through the convol… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
22
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
9

Relationship

1
8

Authors

Journals

citations
Cited by 33 publications
(22 citation statements)
references
References 30 publications
0
22
0
Order By: Relevance
“…Additionally, actual ground data should be compared to the data in predicted and fused images. In this paper, five evaluation metrics-spectral angle mapper (SAM) [41], PSNR [42], spatial correlation coefficient (SCC) [43], SSIM [44] and root mean square error (RMSE) [45]-are used to objectively evaluate and analyze the spatiotemporal fusion results in different datasets. The SAM technique [41] is used to calculate the spectral distortion between the predicted fusion result and the original image; the PSNR [42] reflects the difference between the ground truth image and the predicted fused image based on the statistical mean of the greyscale difference between the corresponding image elements; the SCC [43] is used to assess the similarity of the spatial details in fused and reference images based on high-frequency information; the SSIM [44] measure is implemented between a predicted fused image and a ground truth image; and the RMSE [45] measures the deviation between the predicted and actual reflectance and provides a global description of the radiometric difference between a ground truth image and a predicted fused image.…”
Section: Discussionmentioning
confidence: 99%
“…Additionally, actual ground data should be compared to the data in predicted and fused images. In this paper, five evaluation metrics-spectral angle mapper (SAM) [41], PSNR [42], spatial correlation coefficient (SCC) [43], SSIM [44] and root mean square error (RMSE) [45]-are used to objectively evaluate and analyze the spatiotemporal fusion results in different datasets. The SAM technique [41] is used to calculate the spectral distortion between the predicted fusion result and the original image; the PSNR [42] reflects the difference between the ground truth image and the predicted fused image based on the statistical mean of the greyscale difference between the corresponding image elements; the SCC [43] is used to assess the similarity of the spatial details in fused and reference images based on high-frequency information; the SSIM [44] measure is implemented between a predicted fused image and a ground truth image; and the RMSE [45] measures the deviation between the predicted and actual reflectance and provides a global description of the radiometric difference between a ground truth image and a predicted fused image.…”
Section: Discussionmentioning
confidence: 99%
“…Spatial accuracy can be quantified by spatial characteristics, such as contrast and texture between the predicted and the true images. For spatial evaluation metrics, Robert’s edge (Edge) was used to describe the spatial accuracy of the predicted images [ 75 ]. A value closer to 0 indicates a better image fusion result; a negative value indicates that the edge features are smoothed, and a positive value indicates that the edge features are sharpened.…”
Section: Methodsmentioning
confidence: 99%
“…The transformer completes the task of exchanging information between the two paths. MSNet [39] proposes a network fusion method and applies it to the space-time fusion of remote sensing images. Bazi et al [40] classifies ViT structures for remote sensing scenarios.…”
Section: Modeling Based On the Transformer Architecturementioning
confidence: 99%