The class imbalance problem exists widely in vision data. In these imbalanced datasets, the majority classes dominate the loss and influence the gradient. Hence, these datasets have a significantly negative impact on the performance of many stateof-the-art methods. In this article, we propose a class imbalance loss (CI loss) to handle this problem. To distinguish imbalanced datasets in accordance with the extent of imbalance, we also define an imbalance degree that works as a decision index factor in the CI loss. Because the minority classes with fewer samples probably lose chances in descending the gradient in the training process, CI loss is introduced to make these minority classes descend further than the majority classes. In view of the imbalanced distribution of data in few-shot learning, a method for generating an imbalanced few-shot learning dataset is presented in this article. We conducted a large number of experiments in the MiniImageNet dataset, which showed the effectiveness of an algorithm for model-agnostic metalearning for rapid adaptation with CI loss. In the problem of detecting 15 ship categories, our loss function is transplanted to a rotational region convolutional neural network detection method and a cascade network architecture and achieves higher mean average precision than focal loss and cross-entropy loss. In addition, the Mixed National Institute of Standards and Technology dataset and the Moving and Stationary Target Acquisition and Recognition dataset are sampled to imbalance datasets to verify the effectiveness of CI loss.
Multi-oriented object detection, an important yet challenging task because of the bird's-eye-view perspective, complex background and densely packed objects, is in the spotlight of detection in remote sensing images. Although existing methods have recently experienced substantial progress based on oriented head, they learn little about essential rotation invariance of the objects. In this article, a novel framework is proposed that can learn high-quality rotation invariance features of the multi-oriented objects by three measures. Given a remote sensing image, the MSFF module first merges the global semantic segmentation features predicted by the semantic segmentation branch and the multi-scale features extracted by the backbone with FPN in order to distinguish complex background. Then the discriminative features are used by rotation mainstream, whose structure is similar to Cascade R-CNN and can extract higher-quality rotation invariance features and predict more accurate location information by adaptively adjusting the distribution of the samples through progressive IoU thresholds. And in order to improve the performance of mainstream to predict more accurate oriented bounding box, the horizontal tributaries that can fully leverage the reciprocal relationship between the oriented detection and horizontal detection were added to the latter two stages. Extensive experiments on three public datasets for remote sensing images i.e. Gaofen Airplane, HRSC2016 and DOTA demonstrate that without bells and whistles, the proposed method achieves superior performances compared with the existing state-of-the-art methods for multi-oriented detection. Moreover, our overall system achieves 59.264% mAP of airplane Detection in 2020 Gaofen challenge, ranking 3rd in the final.
Multimodal Land Cover Classification (MLCC) using the optical and Synthetic Aperture Radar (SAR) modalities has resulted in outstanding performances over using only unimodal data duo to their complementary information on land properties. Previous multimodal deep learning (MDL) methods have relied on handcrafted multi-branch convolutional neural networks (CNN) to extract the features of different modalities and merged them for land cover classification. However, natural images-oriented handcrafted CNN models may not the optimal strategies to handle Remote Sensing (RS) image interpretation problems, duo to the huge difference in terms of imaging angles and imaging ways. Furthermore, few MDL methods have analyzed optimal combinations of hierarchical features from different modalities. In this article, we propose an efficient multimodal architecture search framework, namely Multimodal Semantic Consistency-Based Fusion Architecture Search (M 2 SC-FAS) in continuous search space with the gradient-based optimization method, which can not only discover optimal optical-and SARspecific architectures according to the different characteristics of the optical and SAR images, respectively, but also realizes the search of optimal multimodal dense fusion architecture. Specifically, the semantic-consistency constraint is introduced to guarantee dense fusion between hierarchical optical and SAR features with high semantic consistency and then capture the complementary performance on land properties. Finally, the basis of curriculum learning strategy is adopted on the M 2 SC-FAS. Extensive experiments show superior performances of our work on three broad co-registered optical and SAR datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.