Synthetic-aperture radar (SAR) image target detection is widely used in military, civilian and other fields. However, existing detection methods have low accuracy due to the limitations presented by the strong scattering of SAR image targets, unclear edge contour information, multiple scales, strong sparseness, background interference, and other characteristics. In response, for SAR target detection tasks, this paper combines the global contextual information perception of transformers and the local feature representation capabilities of convolutional neural networks (CNNs) to innovatively propose a visual transformer framework based on contextual joint-representation learning, referred to as CRTransSar. First, this paper introduces the latest Swin Transformer as the basic architecture. Next, it introduces the CNN’s local information capture and presents the design of a backbone, called CRbackbone, based on contextual joint representation learning, to extract richer contextual feature information while strengthening SAR target feature attributes. Furthermore, the design of a new cross-resolution attention-enhancement neck, called CAENeck, is presented to enhance the characterizability of multiscale SAR targets. The mAP of our method on the SSDD dataset attains 97.0% accuracy, reaching state-of-the-art levels. In addition, based on the HISEA-1 commercial SAR satellite, which has been launched into orbit and in whose development our research group participated, we released a larger-scale SAR multiclass target detection dataset, called SMCDD, which verifies the effectiveness of our method.
As an active microwave device, synthetic aperture radar (SAR) uses the backscatter of objects for imaging. SAR image ship targets are characterized by unclear contour information, a complex background and strong scattering. Existing deep learning detection algorithms derived from anchor-based methods mostly rely on expert experience to set a series of hyperparameters, and it is difficult to characterize the unique characteristics of SAR image ship targets, which greatly limits detection accuracy and speed. Therefore, this paper proposes a new lightweight position-enhanced anchor-free SAR ship detection algorithm called LPEDet. First, to resolve unclear SAR target contours and multiscale performance problems, we used YOLOX as the benchmark framework and redesigned the lightweight multiscale backbone, called NLCNet, which balances detection speed and accuracy. Second, for the strong scattering characteristics of the SAR target, we designed a new position-enhanced attention strategy, which suppresses background clutter by adding position information to the channel attention that highlights the target information to more accurately identify and locate the target. The experimental results for two large-scale SAR target detection datasets, SSDD and HRSID, show that our method achieves a higher detection accuracy and a faster detection speed than state-of-the-art SAR target detection methods.
High-resolution remote sensing image scene classification has attracted widespread attention as a basic earth observation task. Remote sensing scene classification aims to assign specific semantic labels to remote sensing scene images to serve specified applications. Convolutional neural networks are widely used for remote sensing image classification due to their powerful feature extraction capabilities. However, the existing methods have not overcome the difficulties of large-scene remote sensing images of large intraclass diversity and high interclass similarity, resulting in low performance. Therefore, we propose a new remote sensing scene classification method that combines lightweight channel attention and multiscale feature fusion discrimination, called LmNet. First, ResNeXt is used as the backbone; second, a new lightweight channel attention mechanism is constructed to quickly and adaptively learn the salient features of important channels. Furthermore, we designed a multiscale feature fusion discrimination framework, which fully integrates shallow edge feature information and deep semantic information to enhance feature representation capabilities and uses multiscale features for joint discrimination. Finally, a cross-entropy loss function based on label smoothing is built to reduce the influence of interclass similarity on feature representation. In particular, our lightweight channel attention and multiscale feature fusion mechanism can be flexibly embedded in any advanced backbone as a functional module. The experimental results on three large-scale remote sensing scene classification datasets show that compared with the existing advanced methods, our proposed high-efficiency end-to-end scene classification method has reached state-of-the-art. Moreover, our method has a weaker dependence on labeled data and provided better generalization performance.INDEX TERMS Remote sensing scene classification, convolutional neural network, lightweight channel attention, multiscale feature fusion, label smoothing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.