Global to Local: A Hierarchical Detection Algorithm for Hyperspectral Image Target Detection

Hyperspectral target detection (HTD) is a crucial aspect of remote sensing applications, aiming to identify targets in hyperspectral images (HSIs) based on their known prior spectral signatures. However, the spectral variability resulting from various imaging conditions in multi-temporal hyperspectral images poses a challenge to both classical and deep learning (DL) methods. To overcome the limitations imposed by spectral variability, an implicit contrastive learning-based target detector (ICLTD) is proposed to exploit in-scene spectra in an unsupervised way. First, only prior spectra are utilized for explicit supervision, while an implicit contrastive learning module (ICLM) is designed to normalize the feature distributions of prior and in-scene spectra. This paper theoretically demonstrates that the ICLM can transfer the gradients from prior spectral features to those of in-scene spectra based on their feature similarities and differences. Because of transferred gradient signals, the ICLTD is regularized to extract similar representations for the prior and in-scene target spectra, while augmenting feature differences between the target and background spectra. Additionally, a local spectral similarity constraint (LSSC) is proposed to enhance the capability of scene adaptation by leveraging the spectral similarities among in-scene targets. To validate the performance of the ICLTD under spectral variability, multi-temporal HSIs captured under various imaging conditions are collected to generate prior spectra and in-scene spectra. Comparative evaluations against several DL detectors and classical methods reveal the superior performance of the ICLTD in achieving a balance between target detectability and background suppressibility under spectral variability.

Section: Introductionmentioning

confidence: 99%

Target Detection Adapting to Spectral Variability in Multi-Temporal Hyperspectral Images Using Implicit Contrastive Learning

Zhang,

Gao,

Wang

et al. 2024

“…First of all, water bodies are characterized by multiple scales and shapes, varying greatly in spatial and geometric scales, which influences the extraction results. Moreover, their spectral information in RSIs is complex [10] and susceptible to interference from glaciers, clouds, and shadows. In summary, mapping water body boundaries accurately can be difficult, which may limit the accuracy of water body extraction.…”

Section: Introductionmentioning

confidence: 99%

MSAFNet: Multiscale Successive Attention Fusion Network for Water Body Extraction of Remote Sensing Images

Lyu

Jiang

et al. 2023

Water body extraction is a typical task in the semantic segmentation of remote sensing images (RSIs). Deep convolutional neural networks (DCNNs) outperform traditional methods in mining visual features; however, due to the inherent convolutional mechanism of the network, spatial details and abstract semantic representations at different levels are difficult to capture accurately at the same time, and then the extraction results decline to become suboptimal, especially on narrow areas and boundaries. To address the above-mentioned problem, a multiscale successive attention fusion network, named MSAFNet, is proposed to efficiently aggregate the multiscale features from two aspects. A successive attention fusion module (SAFM) is first devised to extract multiscale and fine-grained features of water bodies, while a joint attention module (JAM) is proposed to further mine salient semantic information by jointly modeling contextual dependencies. Furthermore, the multi-level features extracted by the above-mentioned modules are aggregated by a feature fusion module (FFM) so that the edges of water bodies are well mapped, directly improving the segmentation of various water bodies. Extensive experiments were conducted on the Qinghai-Tibet Plateau Lake (QTPL) and the Land-cOVEr Domain Adaptive semantic segmentation (LoveDA) datasets. Numerically, MSAFNet reached the highest accuracy on both QTPL and LoveDA datasets, including Kappa, MIoU, FWIoU, F1, and OA, outperforming several mainstream methods. Regarding the QTPL dataset, MSAFNet peaked at 99.14% and 98.97% in terms of F1 and OA. Although the LoveDA dataset is more challenging, MSAFNet retained the best performance, with F1 and OA being 97.69% and 95.87%. Additionally, visual inspections exhibited consistency with numerical evaluations.

“…The former introduces multiple 2D branches to extract multi-scale spatial information from the features generated by 3D convolutions in each block (see Figure 2d), and the latter adds one 2D unit after every 3D unit to concentrate on spatial features (Figure 2c). Recently, graph-based methods have been applied to many tasks in hyperspectral image processing, such as classification [22][23][24], clustering [25][26][27], target detection [28], and anomaly detection [29], and have achieved good performance. However, existing 3D HSISR models are very simple and plain, and they are not combined with some advanced inventions of deep learning.…”

Section: Introductionmentioning

confidence: 99%

Rethinking 3D-CNN in Hyperspectral Image Super-Resolution

Liu

Wang

et al. 2023

Recently, CNN-based methods for hyperspectral image super-resolution (HSISR) have achieved outstanding performance. Due to the multi-band property of hyperspectral images, 3D convolutions are natural candidates for extracting spatial–spectral correlations. However, pure 3D CNN models are rare to see, since they are generally considered to be too complex, require large amounts of data to train, and run the risk of overfitting on relatively small-scale hyperspectral datasets. In this paper, we question this common notion and propose Full 3D U-Net (F3DUN), a full 3D CNN model combined with the U-Net architecture. By introducing skip connections, the model becomes deeper and utilizes multi-scale features. Extensive experiments show that F3DUN can achieve state-of-the-art performance on HSISR tasks, indicating the effectiveness of the full 3D CNN on HSISR tasks, thanks to the carefully designed architecture. To further explore the properties of the full 3D CNN model, we develop a 3D/2D mixed model, a popular kind of model prior, called Mixed U-Net (MUN) which shares a similar architecture with F3DUN. Through analysis on F3DUN and MUN, we find that 3D convolutions give the model a larger capacity; that is, the full 3D CNN model can obtain better results than the 3D/2D mixed model with the same number of parameters when it is sufficiently trained. Moreover, experimental results show that the full 3D CNN model could achieve competitive results with the 3D/2D mixed model on a small-scale dataset, suggesting that 3D CNN is less sensitive to data scaling than what people used to believe. Extensive experiments on two benchmark datasets, CAVE and Harvard, demonstrate that our proposed F3DUN exceeds state-of-the-art HSISR methods both quantitatively and qualitatively.