X-Shaped Interactive Autoencoders With Cross-Modality Mutual Learning for Unsupervised Hyperspectral Image Super-Resolution

Deep learning is an important research topic in the field of image super-resolution. Problematically, the performance of existing hyperspectral image super-resolution networks is limited by feature learning for hyperspectral images. Nevertheless, the current algorithms exhibit some limitations in extracting diverse features. In this paper, we address limitations to existing hyperspectral image super-resolution networks, focusing on feature learning challenges. We introduce the Channel-Attention-Based Spatial–Spectral Feature Extraction network (CSSFENet) to enhance hyperspectral image feature diversity and optimize network loss functions. Our contributions include: (a) a convolutional neural network super-resolution algorithm incorporating diverse feature extraction to enhance the network’s diversity feature learning by elevating the matrix rank, (b) a three-dimensional (3D) feature extraction convolution module, the Channel-Attention-Based Spatial–Spectral Feature Extraction Module (CSSFEM), to boost the network’s performance in both the spatial and spectral domains, (c) a feature diversity loss function designed based on the image matrix’s singular value to maximize element independence, and (d) a spatial–spectral gradient loss function introduced based on space and spectrum gradient values to enhance the reconstructed image’s spatial–spectral smoothness. In contrast to existing hyperspectral super-resolution algorithms, we used four evaluation indexes, PSNR, mPSNR, SSIM, and SAM, and our method showed superiority during testing with three common hyperspectral datasets.

Section: Related Workmentioning

confidence: 99%

Hyperspectral Image Super-Resolution Based on Feature Diversity Extraction

Zhang,

Zheng,

Wan

et al. 2024

“…In recent years, research has proved the remarkable power of deep neural networks in modeling complex datasets and mining high-dimensional information, which enables them to extract more representative features compared with conventional methods while exhibiting exceptional feature expression capabilities [32][33][34][35]. The utilization of deep learning techniques has progressively gained prominence for HAD [36].…”

Section: Introductionmentioning

confidence: 99%

A Novel Fully Convolutional Auto-Encoder Based on Dual Clustering and Latent Feature Adversarial Consistency for Hyperspectral Anomaly Detection

Zhao,

Yang,

Meng

et al. 2024

With the development of artificial intelligence, the ability to capture the background characteristics of hyperspectral imagery (HSI) has improved, showing promising performance in hyperspectral anomaly detection (HAD) tasks. However, existing methods proposed in recent years still suffer from certain limitations: (1) Constraints are lacking in the deep feature learning process in terms of the issue of the absence of prior background and anomaly information. (2) Hyperspectral anomaly detectors with traditional self-supervised deep learning methods fail to ensure prioritized reconstruction of the background. (3) The architecture of fully connected deep networks in hyperspectral anomaly detectors leads to low utilization of spatial information and the destruction of the original spatial relationship in hyperspectral imagery and disregards the spectral correlation between adjacent pixels. (4) Hypotheses or assumptions for background and anomaly distributions restrict the performance of many hyperspectral anomaly detectors because the distributions of background land covers are usually complex and not assumable in real-world hyperspectral imagery. In consideration of the above problems, in this paper, we propose a novel fully convolutional auto-encoder based on dual clustering and latent feature adversarial consistency (FCAE-DCAC) for HAD, which is carried out with self-supervised learning-based processing. Firstly, density-based spatial clustering of applications with a noise algorithm and connected component analysis are utilized for successive spectral and spatial clustering to obtain more precise prior background and anomaly information, which facilitates the separation between background and anomaly samples during the training of our method. Subsequently, a novel fully convolutional auto-encoder (FCAE) integrated with a spatial–spectral joint attention (SSJA) mechanism is proposed to enhance the utilization of spatial information and augment feature expression. In addition, a latent feature adversarial consistency network with the ability to learn actual background distribution in hyperspectral imagery is proposed to achieve pure background reconstruction. Finally, a triplet loss is introduced to enhance the separability between background and anomaly, and the reconstruction residual serves as the anomaly detection result. We evaluate the proposed method based on seven groups of real-world hyperspectral datasets, and the experimental results confirm the effectiveness and superior performance of the proposed method versus nine state-of-the-art methods.

“…LiDAR sensors offer rich spatial structural information, and mainstream 3D object detection algorithms typically adopt point cloud-based methods. Due to the unordered and non-structural nature of point cloud data, it is challenging to directly leverage feature extraction networks, as in the case of images [24][25][26][27], to obtain multiscale features. Existing approaches address this issue by voxelization [7][8][9]12,28,29] or Bird's Eye View (BEV) projection [30][31][32] of the raw point cloud, followed by utilizing 3D convolutional neural networks to extract various spatial features.…”

Section: Introductionmentioning

confidence: 99%

DS-Trans: A 3D Object Detection Method Based on a Deformable Spatiotemporal Transformer for Autonomous Vehicles

Zhu,

Xu,

Tao

et al. 2024

Facing the significant challenge of 3D object detection in complex weather conditions and road environments, existing algorithms based on single-frame point cloud data struggle to achieve desirable results. These methods typically focus on spatial relationships within a single frame, overlooking the semantic correlations and spatiotemporal continuity between consecutive frames. This leads to discontinuities and abrupt changes in the detection outcomes. To address this issue, this paper proposes a multi-frame 3D object detection algorithm based on a deformable spatiotemporal Transformer. Specifically, a deformable cross-scale Transformer module is devised, incorporating a multi-scale offset mechanism that non-uniformly samples features at different scales, enhancing the spatial information aggregation capability of the output features. Simultaneously, to address the issue of feature misalignment during multi-frame feature fusion, a deformable cross-frame Transformer module is proposed. This module incorporates independently learnable offset parameters for different frame features, enabling the model to adaptively correlate dynamic features across multiple frames and improve the temporal information utilization of the model. A proposal-aware sampling algorithm is introduced to significantly increase the foreground point recall, further optimizing the efficiency of feature extraction. The obtained multi-scale and multi-frame voxel features are subjected to an adaptive fusion weight extraction module, referred to as the proposed mixed voxel set extraction module. This module allows the model to adaptively obtain mixed features containing both spatial and temporal information. The effectiveness of the proposed algorithm is validated on the KITTI, nuScenes, and self-collected urban datasets. The proposed algorithm achieves an average precision improvement of 2.1% over the latest multi-frame-based algorithms.