“…To further show the effect of our MAANet, we compare it with a set of state-of-the-art RSSC algorithms, covering traditional non-DL methods (i.e., BoVW, 7 IFK, 7 LDA, 7 LLC 8 ) that mainly rely on mid-level features and DL-based methods that are closely related to our network. Specifically, these DL models are subdivided into: (1) traditional CNNs (i.e., GoogLeNet, 7 CaffeNet, 7 VGG-VD-16, 7 and VGG-16-CapsNet 15 ); (2) gated networks (i.e., GBNet 18 and GBNet + global feature 18 ); (3) feature pyramid networks (i.e., EFPN-DSE-TDFF 19 and RANet 20 ); (4) global–local feature fusion networks (i.e., LCNN-BFF, 21 HABFNet, 22 MF2Net, 23 and DAFGCN 24 ); (5) attention-based networks (i.e., MS2AP, 25 MSA-Network, 26 SAFF, 27 ResNet50+EAM, 28 ACNet, 29 CSDS, 30 SEMSDNet, 31 ACR-MLFF, 32 CRAN, 33 and TDFE-DAA); 34 and (6) currently popular transformers (i.e., ViT-B_32, 35 T2T-ViT-12, 36 V16_21k, 37 ViT, 35 PVT-V2-B0, 38 PiT-S, 39 Swin-T, 40 PVT-Medium, 41 and T-CNN 42 ). For a fair comparison, all results are obtained by the source codes or provided by the authors directly.…”