“…In the early stage of development, traditional machine learning methods have been used for scene classification tasks, such as support vector machine and bag of words [2,3]. Recently, deep learning methods have been proven to be effective for extracting image features [4][5][6][7][8], and many studies have demonstrated effective scene classification performance with the help of deep learning from various novel perspectives including self-supervised learning [9], data augmentation [10], feature fusion [11][12][13][14][15], reconstructing networks [16][17][18][19][20][21][22][23], integration of spectral and spatial information [24], balancing global and local features, refining feature maps through encoding method [25], adding a new mechanism [26,27], as well as introducing a new network [28], open set problem [29], and noisy label distillation [30]. However, a lack of annotated data has restricted the development of deep learning methods in scene classification due to the high cost of annotating data.…”