This paper introduces a novel deep learning based method, named bridge neural network (BNN) to dig the potential relationship between two given data sources task by task. The proposed approach employs two convolutional neural networks that project the two data sources into a feature space to learn the desired common representation required by the specific task. The training objective with artificial negative samples is introduced with the ability of mini-batch training and it's asymptotically equivalent to maximizing the total correlation of the two data sources, which is verified by the theoretical analysis. The experiments on the tasks, including pair matching, canonical correlation analysis, transfer learning, and reconstruction demonstrate the state-of-the-art performance of BNN, which may provide new insights into the aspect of common representation learning.
Deep learning methods have made significant progress in ship detection in synthetic aperture radar (SAR) images. The pretraining technique is usually adopted to support deep neural networks-based SAR ship detectors due to the scarce labeled SAR images. However, directly leveraging ImageNet pretraining is hard to obtain a good ship detector because of different imaging perspectives and geometry. In this paper, to resolve the problem of inconsistent imaging perspectives between ImageNet and earth observations, we propose an optical ship detector (OSD) pretraining technique to transfer the characteristics of ships in earth observations to SAR images from a large-scale aerial image dataset. On the other hand, to handle the problem of different imaging geometry between optical and SAR images, we propose an optical-SAR matching (OSM) pretraining technique, which transfers plentiful texture features from optical images to SAR images by common representation learning on the optical-SAR matching task. Finally, observing that the OSD pretraining-based SAR ship detector has a better recall on sea area while the OSM pretraining-based SAR ship detector can reduce false alarms on land area, we combine the predictions of the two detectors through weighted boxes fusion to further improve detection results. Extensive experiments on four SAR ship detection datasets and three representative CNN-based detection benchmarks are conducted to show the effectiveness and complementarity of the two proposed detectors, and the state-of-the-art performance of the combination of the two detectors. The proposed method won the sixth place of ship detection in SAR images in the 2020 Gaofen challenge. Index Terms-ship detection, optical ship detector pretraining, optical-SAR matching pretraining, common representation learning, weighted boxes fusion.
Deep learning technology has been extensively explored by existing methods to improve the performance of target detection in remote sensing images, due to its powerful feature extraction and representation abilities. However, these methods usually focus on the interior features of the target, but ignore the exterior semantic information around the target, especially the object-level relationship. Consequently, these methods fail to detect and recognize targets in the complex background where multiple objects crowd together. To handle this problem, a diversified context information fusion framework based on convolutional neural network (DCIFF-CNN) is proposed in this paper, which employs the structured object-level relationship to improve the target detection and recognition in complex backgrounds. The DCIFF-CNN is composed of two successive sub-networks, i.e., a multi-scale local context region proposal network (MLC-RPN) and an object-level relationship context target detection network (ORC-TDN). The MLC-RPN relies on the fine-grained details of objects to generate candidate regions in the remote sensing image. Then, the ORC-TDN utilizes the spatial context information of objects to detect and recognize targets by integrating an attentional message integrated module (AMIM) and an object relational structured graph (ORSG). The AMIM is integrated into the feed-forward CNN to highlight the useful object-level context information, while the ORSG builds the relations between a set of objects by processing their appearance features and geometric features. Finally, the target detection method based on DCIFF-CNN effectively represents the interior and exterior information of the target by exploiting both the multiscale local context information and the object-level relationships. Extensive experiments are conducted, and experimental results demonstrate that the proposed DCIFF-CNN method improves the target detection and recognition accuracy in complex backgrounds, showing superiority to other state-of-the-art methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.