For oriented object detection, the existing CNN-based methods typically rely on a substantial and diverse dataset, which can be expensive to acquire and demonstrate limited capacity for generalization when faced with new categories that lack annotated samples. In this case, we propose MOCA-Net, a few-shot oriented object detection method with a multi-oriented enhancement branch and context-aware module, utilizing a limited number of annotated samples from novel categories for training. Especially, our method generates multi-oriented and multi-scale positive samples and then inputs them into an RPN and the detection head as a multi-oriented enhancement branch for enhancing the classification and regression capabilities of the detector. And by utilizing the context-aware module, the detector can effectively extract contextual information surrounding the object and incorporate it into RoI features in an adaptive manner, thereby improving its classification capability. As far as we know, our method is the first to attempt this in this field, and comparative experiments conducted on the public remote sensing dataset DOTA for oriented object detection showed that our method is effective.
Recent years have witnessed rapid development and remarkable achievements on deep learning object detection in remote sensing (RS) images. The growing improvement of the accuracy is inseparable from the increasingly complex deep convolutional neural network and the huge amount of sample data. However, the under-fitting neural network will damage the detection performance facing the difficulty of sample acquisition. Thus, it evolves into few-shot object detection (FSOD). In this article, we first briefly introduce the object detection task and its algorithms, to better understand the basic detection frameworks followed by FSOD. Then, FSOD design methods in RS images for three important aspects, such as sample, model, and learning strategy, are respectively discussed. In addition, some valuable research results of FSOD in computer vision field are also included. We advocate a wide research technique route, and some advice about feature enhancement and multi-modal fusion, semantics extraction and cross-domain mapping, fine-tune and meta-learning strategies, and so on, are provided. Based on our stated research route, a novel few-shot detector that focuses on contextual information is proposed. At the end of the paper, we summarize accuracy performance on experimental datasets to illustrate the achievements and shortcomings of the stated algorithms, and highlight the future opportunities and challenges of FSOD in RS image interpretation, in the hope of providing insights into future research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.