On the Co-Selection of Vision Transformer Features and Images for Very High-Resolution Image Scene Classification

Chaib, Souleyman; Mansouri, Dou El Kefel; Omara, Ibrahim; Hagag, Ahmed; Dhelim, Sahraoui; Bensaber, Djamel Amar

doi:10.3390/rs14225817

Cited by 10 publications

(4 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, this approach results in a substantial increase in the model's trainable parameters, potentially compromising its practical deployment due to increased computational demands and memory requirements. To address this challenge, LFAGCU introduces the (GFL) that aims to ensure the modeling of long-range non-local dependencies by leveraging an effective receptive field spanning the dimensions H × W. Notably, while ViT has demonstrated remarkable effectiveness in diverse computer vision tasks 35,36 , it presents limitations in terms of spatial inductive bias and its propensity for fine-tuning, hindering its full potential for certain tasks 37 . To overcome the limitations of weighted averaging of each pixel within the receptive field during convolution operations, which can lead to noise pixels affecting the distinguishability of image target pixels, the GFL module utilizes a multi-head self-attention mechanism for comprehensive global context modeling.…”

Section: Global Context Modelingmentioning

confidence: 99%

Local feature acquisition and global context understanding network for very high-resolution land cover classification

Li,

Hu,

et al. 2024

Sci Rep

View full text Add to dashboard Cite

Very high-resolution remote sensing images hold promising applications in ground observation tasks, paving the way for highly competitive solutions using image processing techniques for land cover classification. To address the challenges faced by convolutional neural network (CNNs) in exploring contextual information in remote sensing image land cover classification and the limitations of vision transformer (ViT) series in effectively capturing local details and spatial information, we propose a local feature acquisition and global context understanding network (LFAGCU). Specifically, we design a multidimensional and multichannel convolutional module to construct a local feature extractor aimed at capturing local information and spatial relationships within images. Simultaneously, we introduce a global feature learning module that utilizes multiple sets of multi-head attention mechanisms for modeling global semantic information, abstracting the overall feature representation of remote sensing images. Validation, comparative analyses, and ablation experiments conducted on three different scales of publicly available datasets demonstrate the effectiveness and generalization capability of the LFAGCU method. Results show its effectiveness in locating category attribute information related to remote sensing areas and its exceptional generalization capability. Code is available at https://github.com/lzp-lkd/LFAGCU.

show abstract

Section: Global Context Modelingmentioning

confidence: 99%

Local feature acquisition and global context understanding network for very high-resolution land cover classification

Li,

Hu,

et al. 2024

Sci Rep

View full text Add to dashboard Cite

show abstract

“…Object-oriented and visual attention-based methods have shown potential but are limited by manual feature extraction and model robustness issues. Here, we propose a novel approach that incorporates attention mechanism fusion and robotic multimodal information fusion decision-making in the framework of graph neural algorithms to address these challenges (Chaib et al, 2022 ; Chen et al, 2022 ; Tian et al, 2023 ).…”

Section: Related Workmentioning

confidence: 99%

“…Despite the progress made, the existing pixel-based methods can only reflect spectral information at an individual pixel level and lack a comprehensive understanding of the overall remote sensing image, leading to difficulties in obtaining meaningful Here, we propose a novel approach that incorporates attention mechanism fusion and robotic multimodal information fusion decision-making in the framework of graph neural algorithms to address these challenges (Chaib et al, 2022;Chen et al, 2022;Tian et al, 2023).…”

Section: Related Workmentioning

confidence: 99%

Remote sensing traffic scene retrieval based on learning control algorithm for robot multimodal sensing information fusion and human-machine interaction and collaboration

Peng,

Shi,

Wang

2023

Front. Neurorobot.

View full text Add to dashboard Cite

In light of advancing socio-economic development and urban infrastructure, urban traffic congestion and accidents have become pressing issues. High-resolution remote sensing images are crucial for supporting urban geographic information systems (GIS), road planning, and vehicle navigation. Additionally, the emergence of robotics presents new possibilities for traffic management and road safety. This study introduces an innovative approach that combines attention mechanisms and robotic multimodal information fusion for retrieving traffic scenes from remote sensing images. Attention mechanisms focus on specific road and traffic features, reducing computation and enhancing detail capture. Graph neural algorithms improve scene retrieval accuracy. To achieve efficient traffic scene retrieval, a robot equipped with advanced sensing technology autonomously navigates urban environments, capturing high-accuracy, wide-coverage images. This facilitates comprehensive traffic databases and real-time traffic information retrieval for precise traffic management. Extensive experiments on large-scale remote sensing datasets demonstrate the feasibility and effectiveness of this approach. The integration of attention mechanisms, graph neural algorithms, and robotic multimodal information fusion enhances traffic scene retrieval, promising improved information extraction accuracy for more effective traffic management, road safety, and intelligent transportation systems. In conclusion, this interdisciplinary approach, combining attention mechanisms, graph neural algorithms, and robotic technology, represents significant progress in traffic scene retrieval from remote sensing images, with potential applications in traffic management, road safety, and urban planning.

show abstract

“…They used random forests and support vector machines (SVM), and their combined strengths were applied separately to Landsat-8, Sentinel-2, and Planet images separately to assess the individual and overall class accuracy of the images. CHAIB et al [12] proposed a new deep framework is proposed for very high-resolution (VHR) scene understanding by exploring the strengths of vision transformer (ViT) features in a simple and effective way. This pretrained ViT model is used to extract informative features from the original VHR image scene, where the transformer-encoder layers are used to generate the feature descriptors of the input images.…”

Section: Introductionmentioning

confidence: 99%

Classification and Recognition of Building Appearance Based on Optimized Gradient-Boosted Decision Tree Algorithm

Liu

et al. 2023

Sensors

View full text Add to dashboard Cite

There are high concentrations of urban spaces and increasingly complex land use types. Providing an efficient and scientific identification of building types has become a major challenge in urban architectural planning. This study used an optimized gradient-boosted decision tree algorithm to enhance a decision tree model for building classification. Through supervised classification learning, machine learning training was conducted using a business-type weighted database. We innovatively established a form database to store input items. During parameter optimization, parameters such as the number of nodes, maximum depth, and learning rate were gradually adjusted based on the performance of the verification set to achieve optimal performance on the verification set under the same conditions. Simultaneously, a k-fold cross-validation method was used to avoid overfitting. The model clusters trained in the machine learning training corresponded to various city sizes. By setting the parameters to determine the size of the area of land for a target city, the corresponding classification model could be invoked. The experimental results show that this algorithm has high accuracy in building recognition. Especially in R, S, and U-class buildings, the overall accuracy rate of recognition reaches over 94%.

show abstract

On the Co-Selection of Vision Transformer Features and Images for Very High-Resolution Image Scene Classification

Cited by 10 publications

References 46 publications

Local feature acquisition and global context understanding network for very high-resolution land cover classification

Local feature acquisition and global context understanding network for very high-resolution land cover classification

Remote sensing traffic scene retrieval based on learning control algorithm for robot multimodal sensing information fusion and human-machine interaction and collaboration

Classification and Recognition of Building Appearance Based on Optimized Gradient-Boosted Decision Tree Algorithm

Contact Info

Product

Resources

About