“…The filtered 50 articles target specific research problems, as depicted in Table 2, with a shared objective of enhancing remote sensing scene classification accuracy. Among the articles, 11 specifically focused on capturing more discriminative regions through the fusion of processed images in [75,78,83], multilayer fusion in [79,80,125,136,137], FC replaced by CapsNet [138] in [33,139], and pairwise comparison in [140]. To focus on key regions, attention mechanism is introduced in [100,103,104,106,107], while the use of classifier-detector is introduced in [86] and multiple instance learning (MIL) [141] in [142].…”