2020
DOI: 10.21203/rs.3.rs-32802/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

FusAtNet: Dual Attention based SpectroSpatial Multimodal Fusion Network for Hyperspectral and LiDAR Classification

Abstract: With recent advances in sensing, multimodal data is becoming easily available for various applications, especially in remote sensing (RS), where many data types like multispectral imagery (MSI), hyperspectral imagery (HSI), LiDAR etc. are available. Effective fusion of these multisource datasets is becoming important, for these multimodality features have been shown to generate highly accurate land-cover maps. However, fusion in the context of RS is non-trivial considering the redundancy involved in the data a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
8
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 35 publications
(8 citation statements)
references
References 6 publications
0
8
0
Order By: Relevance
“…Discussion. CAF differs to existing methods [62,48] in the computation of the cross attention. Using stacked features F U to attend to each modality Q U brings three benefits: (a) it is order-agnostic: for any modality pair we compute crossattention once, instead of twice by interchanging queries and keys/values; this results in reduced computation; (b) each modality serves as a query to search for tokens in other modalities; this brings rich feature fusion; and (c) it generalizes to any number of modalities, resulting in scalability.…”
Section: Methodsmentioning
confidence: 98%
See 1 more Smart Citation
“…Discussion. CAF differs to existing methods [62,48] in the computation of the cross attention. Using stacked features F U to attend to each modality Q U brings three benefits: (a) it is order-agnostic: for any modality pair we compute crossattention once, instead of twice by interchanging queries and keys/values; this results in reduced computation; (b) each modality serves as a query to search for tokens in other modalities; this brings rich feature fusion; and (c) it generalizes to any number of modalities, resulting in scalability.…”
Section: Methodsmentioning
confidence: 98%
“…(a) Cross-attention is used in cross-domain knowledge transfer to learn acrosscue correlations by attending the features from one domain to another [62,48,63]. In CAF, it models the relationship among vision, audio, and face features.…”
Section: Methodsmentioning
confidence: 99%
“…Deep learning (DL), as an automatic feature learning technique, has demonstrated outstanding capabilities in feature extraction and has been widely applied to multimodal data fusion [3], [19], [20]. Among DL algorithms, convolutional neural networks (CNNs) are the go-to models for feature extraction due to their efficiency and ease of optimization [21].…”
Section: Introductionmentioning
confidence: 99%
“…In [25], [27], attention blocks are applied to a single modality, highlighting only unimodal features. Subsequently, some architectures [20], [28], [29] generate attention masks from one modality to enhance representations of the other. For example, FusAtNet [20] develops a "crossattention" module to exploit LiDAR-derived attention maps to highlight spatial features of HSIs.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation