2022
DOI: 10.1109/lgrs.2021.3070016
|View full text |Cite
|
Sign up to set email alerts
|

Multilevel Feature Fusion Networks With Adaptive Channel Dimensionality Reduction for Remote Sensing Scene Classification

Abstract: Scene classification in very high resolution (VHR) remote sensing (RS) images is a challenging task due to complex and diverse content of the images. Recently, convolution neural networks (CNNs) have been utilized to tackle this task. However, CNNs cannot fully meet the needs of scene classification due to clutters and small objects in VHR images. To handle these challenges, this letter presents a novel multi-level feature fusion network with adaptive channel dimensionality reduction for RS scene classificatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
32
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 36 publications
(32 citation statements)
references
References 24 publications
0
32
0
Order By: Relevance
“…To comprehensively evaluate the classification performance of the proposed HHTL framework, we compare our method with some state-of-the-art CNN-based and transformer-based methods. The CNN-based methods are GoogLeNet [1], [2], VGGNet-16 [1], [2], VGG-16-CapsNet [71], SCCov [24], VGG-VD16+MSCP+MRA [72], GBNet+global feature [40], MIDC-Net CS [73], EFPN-DSE-TDFF [41], DFAGCN [74], EfficientNet-B0-aux [75], SF-CNN with VGGNet [76], MG-CAP (Sqrt-E) [43], ACNet [52], ACR-MLFF [77], MSA-Network [78]. The transformer-based methods are T2T-ViT-12 [62], Pooling-based Vision Transformer-Small (PiT-S) [79], and Pyramid Vision Transformer-Medium (PVT-Medium) [80].…”
Section: Experimental Results and Comparisonsmentioning
confidence: 99%
See 2 more Smart Citations
“…To comprehensively evaluate the classification performance of the proposed HHTL framework, we compare our method with some state-of-the-art CNN-based and transformer-based methods. The CNN-based methods are GoogLeNet [1], [2], VGGNet-16 [1], [2], VGG-16-CapsNet [71], SCCov [24], VGG-VD16+MSCP+MRA [72], GBNet+global feature [40], MIDC-Net CS [73], EFPN-DSE-TDFF [41], DFAGCN [74], EfficientNet-B0-aux [75], SF-CNN with VGGNet [76], MG-CAP (Sqrt-E) [43], ACNet [52], ACR-MLFF [77], MSA-Network [78]. The transformer-based methods are T2T-ViT-12 [62], Pooling-based Vision Transformer-Small (PiT-S) [79], and Pyramid Vision Transformer-Medium (PVT-Medium) [80].…”
Section: Experimental Results and Comparisonsmentioning
confidence: 99%
“…Like to UCM and AID, we can find that the performance of our HHTL framework is best. Compared with other methods, when 10% scenes are used for training, the enhancements in OA obtained by our HHTL framework are 15.88% (over GoogLeNet), 15.6% (over VGGNet-16), 6.99% (over VGG-16-CapsNet), 2.77% (over SCCov), 4% (over VGG-VD16+MSCP+MRA), 5.95% (over MIDC-Net CS), 0.98% (over ACNet), 2.11% (over EfficientNet-B0-aux), 2.18% (over SF-CNN with VGGNet), 2.06% (over ACR-MLFF), 1.69% (over MSA-Network), 1.24% (over MG- [71] 91.63±0.19 94.74±0.17 SCCov [24] 93.12±0.25 96.10±0.16 VGG-VD16+MSCP+MRA [72] 92.21±0.17 95.56±0.18 GBNet+global feature [40] 92.20±0.23 95.48±0.12 MIDC-Net CS [73] 88.51±0.41 92.95±0.17 EFPN-DSE-TDFF [41] 94.02±0.21 94.50±0.30 ACNet [52] 93.33±0.29 95.38±0.29 DFAGCN [74] -94.88±0.22 EfficientNet-B0-aux [75] 93.69±0.11 96.17±0.16 SF-CNN with VGGNet [76] 93.60±0.12 96.66±0.11 ACR-MLFF [77] 92.73±0.12 95.06±0.33 MSA-Network [78] 93.53±0.21 96.01±0.43 MG-CAP(Sqrt-E) [43] 93.34±0.18 96.12±0.12 HHTL (ours) 95.62±0.13 96.88±0.21…”
Section: Experimental Results and Comparisonsmentioning
confidence: 99%
See 1 more Smart Citation
“…In view of the strong feature description capabilities of CNNs, they have been broadly applied to many computer vision tasks, involving remote sensing scene classification [5]- [9]. At present, a lot of CNN methods use the features extracted from the top layers for RS image representation, for the top layers give more semantically meaningful representations that are suitable for extracting global visual scene context.…”
Section: A Cnn Modelsmentioning
confidence: 99%
“…Recently, with the rapid development of deep learning (DL), convolutional neural networks (CNNs) have demonstrated competitive performances for many computer vision tasks, including RS scene classification [5]- [8]. In CNNs, the lowerlevel features from shallow layers reflect the details of images, while the higher-level ones from deep layers contain rich semantic information and thus are more discriminative, abstract, and robust [9].…”
Section: Introductionmentioning
confidence: 99%