Multimodal Transformer Network for Hyperspectral and LiDAR Classification

Zhang, Yiyan; Xu, Shufang; Hong, Danfeng; Zhang, Chenkai; Bi, Meiqiao; Li, Chenming

doi:10.1109/tgrs.2023.3283508

Cited by 26 publications

(2 citation statements)

References 55 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This limitation could hinder interpretability and optimization efforts. In [48], a transition from single-mode RS to diverse data integration is highlighted using MTNet, demonstrating its effectiveness in capturing spectral and spatial information. However, a detailed analysis of the computational efficiency and scalability of MTNet is lacking.…”

Section: Introductionmentioning

confidence: 99%

EXNet: (2+1)D Extreme Xception Net for Hyperspectral Image Classification

Ghous,

Sarfraz,

Ahmad

et al. 2024

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

Self Cite

View full text Add to dashboard Cite

3D-CNNs have demonstrated their capability to capture intricate non-linear relationships within Hyperspectral Images (HSIs). However, the computational complexity of 3D CNNs often leads to slower processing speeds, limited generalization, and susceptibility to overfitting. In response to these challenges, this study introduces the concept of depthwise separable convolutions using (2+1)D convolutions as an alternative to traditional 3D convolutions for Hyperspectral Image Classification (HSIC). The study observes that (2+1)D convolutions can effectively approximate the complex relationships represented by 3D convolutions while requiring fewer convolutional operations, thereby reducing the computational overhead associated with classification. Experimental results obtained from benchmark HSI datasets, including Indian Pines, Botswana, Pavia University, and Salinas, demonstrate that the proposed model yields results that are comparable to those achieved by various state-of-the-art models in the existing literature. The source code is available on GitHub github.com/mahmad00/Extreme-Xception-Net-for-HSIC.

show abstract

Section: Introductionmentioning

confidence: 99%

EXNet: (2+1)D Extreme Xception Net for Hyperspectral Image Classification

Ghous,

Sarfraz,

Ahmad

et al. 2024

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

Self Cite

View full text Add to dashboard Cite

show abstract

“…In comparison, deep learning (DL) can autonomously extract depth features through multi-layer neural networks [23,24]. Following their achievements in the realm of RGB images, DL models have been widely adopted in HSI classification due to their strong feature representation capabilities [25,26]. In their nascent stages, Chen et al spearheaded the use of DL models for HSI classification by devising a stacked autoencoder for high-level feature extraction [27].…”

Section: Introductionmentioning

confidence: 99%

Spatial-Pooling-Based Graph Attention U-Net for Hyperspectral Image Classification

Diao,

Dai,

Wang

et al. 2024

Remote Sensing

View full text Add to dashboard Cite

In recent years, graph convolutional networks (GCNs) have attracted increasing attention in hyperspectral image (HSI) classification owing to their exceptional representation capabilities. However, the high computational requirements of GCNs have led most existing GCN-based HSI classification methods to utilize superpixels as graph nodes, thereby limiting the spatial topology scale and neglecting pixel-level spectral–spatial features. To address these limitations, we propose a novel HSI classification network based on graph convolution called the spatial-pooling-based graph attention U-net (SPGAU). Specifically, unlike existing GCN models that rely on fixed graphs, our model involves a spatial pooling method that emulates the region-growing process of superpixels and constructs multi-level graphs by progressively merging adjacent graph nodes. Inspired by the CNN classification framework U-net, SPGAU’s model has a U-shaped structure, realizing multi-scale feature extraction from coarse to fine and gradually fusing features from different graph levels. Additionally, the proposed graph attention convolution method adaptively aggregates adjacency information, thereby further enhancing feature extraction efficiency. Moreover, a 1D-CNN is established to extract pixel-level features, striking an optimal balance between enhancing the feature quality and reducing the computational burden. Experimental results on three representative benchmark datasets demonstrate that the proposed SPGAU outperforms other mainstream models both qualitatively and quantitatively.

show abstract