Scene classification in very high resolution (VHR) remote sensing (RS) images is a challenging task due to complex and diverse content of the images. Recently, convolution neural networks (CNNs) have been utilized to tackle this task. However, CNNs cannot fully meet the needs of scene classification due to clutters and small objects in VHR images. To handle these challenges, this letter presents a novel multi-level feature fusion network with adaptive channel dimensionality reduction for RS scene classification. Specifically, an adaptive method is designed for channel dimensionality reduction of high dimensional features. Then, a multi-level feature fusion module is introduced to fuse the features in an efficient way. Experiments on three widely used data sets show that our model outperforms several state-of-the-art methods in terms of both accuracy and stability.
Remote sensing (RS) scene classification plays an important role in a wide range of RS applications. Recently, convolutional neural networks (CNNs) have been applied to the field of scene classification in RS images and achieved impressive performance. However, to classify RS scenes, most of the existing CNN methods either utilize the high-level features from the last convolutional layer of CNNs, missing much important information existing at the other levels, or directly fuse the features at different levels, bringing redundant and/or mutually exclusive information. Inspired by the attention mechanism of the human visual system, in this paper, we explore a novel relation-attention model and design an end-to-end relationattention network (RANet) to learn powerful feature representations of multiple levels to further improve the classification performance. First, we propose to extract convolutional features at different levels by pre-trained CNNs. Second, a multi-scale feature computation module is constructed to connect features at different levels and generate multi-scale semantic features. Third, a novel relation-attention model is designed to focus on the critical features whilst avoiding the use of redundant and even distractive ones by exploiting the scale contextual information. Finally, the resulting relation-attention features are concatenated and fed into a softmax layer for the final classification. Experiments on four well-known RS scene classification data sets (UC-Merced, WHU-RS19, AID, and OPTIMAL-31) show that our method outperforms some state-ofthe-art algorithms. The code of our proposed method is publicly
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.