Scene classification of high-resolution remote sensing (HRRS) image is an important research topic and has been applied broadly in many fields. Deep learning method has shown its high potential to in this domain, owing to its powerful learning ability of characterizing complex patterns. However the deep learning methods omit some global and local information of the HRRS image. To this end, in this article we show efforts to adopt explicit global and local information to provide complementary information to deep models. Specifically, we use a patch based MS-CLBP method to acquire global and local representations, and then we consider a pretrained CNN model as a feature extractor and extract deep hierarchical features from full-connection layers. After fisher vector (FV) encoding, we obtain the holistic visual representation of the scene image. We view the scene classification as a reconstruction procedure and train several class-specific stack denoising autoencoders (SDAEs) of corresponding class, i.e., one SDAE per class, and classify the test image according to the reconstruction error. Experimental results show that our combination method outperforms the state-of-the-art deep learning classification methods without employing fine-tuning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.