Urban land cover classification for high-resolution images is a fundamental yet challenging task in remote sensing image analysis. Recently, deep learning techniques have achieved outstanding performance in high-resolution image classification, especially the methods based on deep convolutional neural networks (DCNNs). However, the traditional CNNs using convolution operations with local receptive fields are not sufficient to model global contextual relations between objects. In addition, multiscale objects and the relatively small sample size in remote sensing have also limited classification accuracy. In this paper, a relation-enhanced multiscale convolutional network (REMSNet) method is proposed to overcome these weaknesses. A dense connectivity pattern and parallel multi-kernel convolution are combined to build a lightweight and varied receptive field sizes model. Then, the spatial relation-enhanced block and the channel relation-enhanced block are introduced into the network. They can adaptively learn global contextual relations between any two positions or feature maps to enhance feature representations. Moreover, we design a parallel multi-kernel deconvolution module and spatial path to further aggregate different scales information. The proposed network is used for urban land cover classification against two datasets: the ISPRS 2D semantic labelling contest of Vaihingen and an area of Shanghai of about 143 km 2 . The results demonstrate that the proposed method can effectively capture long-range dependencies and improve the accuracy of land cover classification. Our model obtains an overall accuracy (OA) of 90.46% and a mean intersection-over-union (mIoU) of 0.8073 for Vaihingen and an OA of 88.55% and a mIoU of 0.7394 for Shanghai.Automatic urban land cover classification in remote sensing images is a difficult task due to several challenges. First of all, high-resolution remote sensing images with abundant details generally have characteristics of high intraclass variance and low inter-class variance [10]. For example, different classes of ground objects may have a similar appearance in remote sensing images, such as trees and low vegetation or roofs and roads. Meanwhile, there is occlusion between different objects. To solve this issue, context information has been widely studied in remote sensing image classification [11]. Contextual information refers to the dependencies between objects, such as cars on roads. Misclassification will be reduced when taking these context relationships into account. In addition, ground objects often have various scales in remote sensing images. Cars and buildings have large differences in size. It is difficult to simultaneously distinguish objects with distinct scales. Generally, deeper layers with larger receptive fields are more suited for segmenting large objects, while shallower layers with smaller receptive fields are suitable to segment small objects [12]. Therefore, it is necessary to enhance global context relationships and fuse multiscale information for pixel-...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.