2022
DOI: 10.1155/2022/6785966
|View full text |Cite
|
Sign up to set email alerts
|

Nested Transformers for Hyperspectral Image Classification

Abstract: Substantial deep learning methods have been utilized for hyperspectral image (HSI) classification recently. Vision Transformer (ViT) is skilled in modeling the overall structure of images and has been introduced to HSI classification task. However, the fixed patch division operation in ViT may lead to insufficient feature extraction, especially the features of the edges between patches will be ignored. To address this problem, we devise a workflow for HSI classification based on the Nested Transformers (NesT).… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 22 publications
0
4
0
Order By: Relevance
“…To mitigate information redundancy [50] and Hughes phenomenon [51] due to the high spectral correlation of HSI data, we use the principal component analysis method to reduce the spectral dimensionality and retain the first eight principal components in the sample extraction stage. The local neighbourhood sampling is then performed with a patch size of 15 � 15, which is a common configuration in previous work [2,30,31]. We used a consistent training set construction approach with equal random sampling of 10, 50, and 1000 samples per class for the Indian Pines (1.5%), Salinas (1.5%), and Xiongan (0.5%) datasets respectively.…”
Section: Experimental Settingmentioning
confidence: 99%
See 1 more Smart Citation
“…To mitigate information redundancy [50] and Hughes phenomenon [51] due to the high spectral correlation of HSI data, we use the principal component analysis method to reduce the spectral dimensionality and retain the first eight principal components in the sample extraction stage. The local neighbourhood sampling is then performed with a patch size of 15 � 15, which is a common configuration in previous work [2,30,31]. We used a consistent training set construction approach with equal random sampling of 10, 50, and 1000 samples per class for the Indian Pines (1.5%), Salinas (1.5%), and Xiongan (0.5%) datasets respectively.…”
Section: Experimental Settingmentioning
confidence: 99%
“…However, the local receptive field settings also make them less robust to local variable features in HSIs. In contrast, ViT‐based [30] and MLP‐based [31] models manage global contextual information through self‐attention mechanisms and fully connected mappings over feature sequences respectively. This advantage helps them to mitigate the adverse effects of local variable features, but also leads to complex models with massive parameters.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, the MLP-Mixer model based on multi-layer perceptron mapping (MLP) encoding and the ViT model based on multi-headed attention mechanism (MHA) encoding have better compensated for this deficiency. For example, the Shift-MLP model [6] based on MLP encoding and NestViT model [7] based on MHA give better accuracy than the CNN series of models in remote sensing classification tasks. The performance of deep learning models generally depends on the training of a large number of labeled samples.…”
Section: Research Backgroundmentioning
confidence: 99%
“…proposed a transformer block suitable for image classification. Following it, recent works 25 29 presented a series of transformer-based network structures for more general high-level vision tasks. Due to its significant performance, transformer has also been introduced into low-level vision tasks 30…”
Section: Related Workmentioning
confidence: 99%