Object detection in very-high-resolution (VHR) remote sensing imagery remains a challenge. Environmental factors, such as illumination intensity and weather, reduce image quality, resulting in poor feature representation and limited detection accuracy. To enrich the feature representation and mine the underlying context information among objects, this article proposes a context-aware convolutional neural network (CA-CNN) model for object detection that includes proposal generation, context feature extraction, feature fusion, and classification. During feature extraction, we propose integrating a context-regions-of-interests (Context-RoIs) mining layer into the CNN model and extracting context features by mapping Context-RoIs mined from the foreground proposals to multilevel feature maps. Finally, the context features extracted from multilevel layers are fused into a single layer, and the proposals represented by the fused features are classified by a softmax classifier. In this article, through numerous experiments, we thoroughly explore the influence of key factors, such as Context-RoIs, different feature scales, and different spatial context window sizes. Because of the end-to-end network design approach, our proposed model simultaneously maintains high efficiency and effectiveness. We conducted all model testing on the public NWPU VHR-10 data set. The experimental results demonstrate that our proposed CA-CNN model achieves significantly improved model performance and better detection results compared with the stateof-the-art methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.