Remote sensing image scene classification has drawn significant attention for its potential applications in the economy and livelihoods. Unlike the traditional handcrafted features, the convolutional neural networks (CNNs) provides an excellent avenue in obtaining powerful discriminative features. Although tremendous efforts have been made so far in this domain, there are still many open challenges in scene classification due to the scene complexity with higher within-class diversity and between-class similarity. To solve the above-mentioned problems, D-CapsNet is proposed to learn the richer and more robust features for scene classification. It is an end to end network with four types of layers and incorporates visual attention mechanisms. Its diverse capsules encode different properties of complex image scenes, including deep high-level features, spatial attention based on the fusion of multilayers features, both spatial and channel attention based on high-level features, and their fusion. Experiments on three image scene datasets demonstrate that D-CapsNet outperforms other baselines and state-of-the-art methods with a significant improvement in both classification accuracy and speed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.