Remote sensing image scene classification is one of the most challenging problems in understanding high-resolution remote sensing images. Deep learning techniques, especially the convolutional neural network (CNN), have improved the performance of remote sensing image scene classification due to the powerful perspective of feature learning and reasoning. However, several fully connected layers are always added to the end of CNN models, which is not efficient in capturing the hierarchical structure of the entities in the images and does not fully consider the spatial information that is important to classification. Fortunately, capsule network (CapsNet), which is a novel network architecture that uses a group of neurons as a capsule or vector to replace the neuron in the traditional neural network and can encode the properties and spatial information of features in an image to achieve equivariance, has become an active area in the classification field in the past two years. Motivated by this idea, this paper proposes an effective remote sensing image scene classification architecture named CNN-CapsNet to make full use of the merits of these two models: CNN and CapsNet. First, a CNN without fully connected layers is used as an initial feature maps extractor. In detail, a pretrained deep CNN model that was fully trained on the ImageNet dataset is selected as a feature extractor in this paper. Then, the initial feature maps are fed into a newly designed CapsNet to obtain the final classification result. The proposed architecture is extensively evaluated on three public challenging benchmark remote sensing image datasets: the UC Merced Land-Use dataset with 21 scene categories, AID dataset with 30 scene categories, and the NWPU-RESISC45 dataset with 45 challenging scene categories. The experimental results demonstrate that the proposed method can lead to a competitive classification performance compared with the state-of-the-art methods.
High-resolution remote sensing image-based land-use scene classification is a difficult task, which is to recognize the semantic category of a given land-use scene image based on priori knowledge. Land-use scenes often cover multiple land-cover classes or ground objects, which makes a scene very complex and difficult to represent and recognize. To deal with this problem, this paper applies the well-known bag-of-visualwords (BOVWs) model which has been very successful in natural image scene classification. Moreover, many existing BOVW methods only use scale-invariant feature transform (SIFT) features to construct visual vocabularies, lacking in investigation of other features or feature combinations, and they are also sensitive to the rotation of image scenes. Therefore, this paper presents a concentric circle-based spatial-rotation-invariant representation strategy for describing spatial information of visual words and proposes a concentric circle-structured multiscale BOVW method using multiple features for land-use scene classification. Experiments on public land-use scene classification datasets demonstrate that the proposed method is superior to many existing BOVW methods and is very suitable to solve the land-use scene classification problem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.