Using Neural Encoder-Decoder Models With Continuous Outputs for Remote Sensing Image Captioning

Ramos, Rita Parada; Martins, Bruno

doi:10.1109/access.2022.3151874

Cited by 17 publications

(4 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Language integration in RS has showcased impressive capabilities across various tasks, including image captioning [2,[17][18][19][20][21][22][23][24][25][26][27][28], VQA [3,[29][30][31][32], and text-image retrieval [4]. A comprehensive review of NLP applications in RS can be found at [1].…”

Section: Nlp In Remote Sensingmentioning

confidence: 99%

“…Other advancements in image captioning include models that summarize multiple captions into one during training [21]. Ramos et al [22] used continuous word vector representations in the decoder instead of discrete representations. Hoxha et al [23] employed a decoder based on multiple Support Vector Machines (SVMs) to alleviate overfitting.…”

Section: Nlp In Remote Sensingmentioning

confidence: 99%

See 1 more Smart Citation

RS-LLaVA: A Large Vision-Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery

Bazi,

Bashmal,

Al Rahhal

et al. 2024

Remote Sensing

View full text Add to dashboard Cite

In this paper, we delve into the innovative application of large language models (LLMs) and their extension, large vision-language models (LVLMs), in the field of remote sensing (RS) image analysis. We particularly emphasize their multi-tasking potential with a focus on image captioning and visual question answering (VQA). In particular, we introduce an improved version of the Large Language and Vision Assistant Model (LLaVA), specifically adapted for RS imagery through a low-rank adaptation approach. To evaluate the model performance, we create the RS-instructions dataset, a comprehensive benchmark dataset that integrates four diverse single-task datasets related to captioning and VQA. The experimental results confirm the model’s effectiveness, marking a step forward toward the development of efficient multi-task models for RS image analysis.

show abstract

Section: Nlp In Remote Sensingmentioning

confidence: 99%

Section: Nlp In Remote Sensingmentioning

confidence: 99%

RS-LLaVA: A Large Vision-Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery

Bazi,

Bashmal,

Al Rahhal

et al. 2024

Remote Sensing

View full text Add to dashboard Cite

show abstract

“…In terms of the other branch, we have that the use of subsymbolic AI approaches such as deep neural networks, to solve geospatial problems, is also a common component of GeoAI research. Although some existing deep learning architectures for tasks, such as image classification, image segmentation, question answering, modeling language and vision, or entity recognition, can be readily used for GeoAI tasks such as classification and object detection in remote sensing images (Bastani et al, 2022; Camps‐Valls et al, 2021), land use classification (Camps‐Valls et al, 2021), geographic question answering and question answering over Earth observation products (Coelho et al, 2021; Silva et al, 2022), remote sensing image captioning (Ramos & Martins, 2022), or place name recognition and resolution (Cardoso et al, 2021; Kulkarni et al, 2021; Liu et al, 2022), some unique challenges emerge which require special model designs, training objectives, or data pre‐processing techniques, for instance by incorporating spatial principles and spatial inductive biases. We call this kind of practices spatially explicit machine learning (Janowicz et al, 2020; Li et al, 2021; Mai, Janowicz, Yan, et al, 2020; Mai, Jiang, et al, 2022; Yan et al, 2017, 2019; Zhu, Janowicz, Cai, & Mai, 2022).…”

Section: Symbolic and Subsymbolic Geoaimentioning

confidence: 99%

Symbolic and subsymbolicGeoAI: Geospatial knowledge graphs and spatially explicit machine learning

Mai

Gao

et al. 2022

Transactions in GIS

Self Cite

View full text Add to dashboard Cite

The field of Artificial Intelligence (AI) can be roughly divided into two branches: Symbolic AI and Connectionist AI (or the so-called Subsymbolic AI). Symbolic AI focuses on research based on classical logic and higher-level symbolic (human-readable) knowledge representations. It posits the use of declarative knowledge in reasoning and learning as critical to producing intelligent behavior (Goel, 2022). Examples are logical inference, symbolic reasoning, ontol-

show abstract

“…An RSI captioning challenge has attracted a lot of attention [4]. The captioning work must be used for a variety of beneficial potential applications, including image retrieval [5][6]. More semantic details about an RSI may be available through the automatic caption generation.…”

Section: Introductionmentioning

confidence: 99%

Deep Attention Based Dense Net with Visual Switch Added BiLSTM for Caption Generation from Remote Sensing Images

2023

IJIES

View full text Add to dashboard Cite

Remote sensing image captioning is the challenging task due to low global information, single feature extraction and lack of detailed image captions. To address these issues, this research proposed a deep attention based DenseNet with visual switch added bidirectional long short-term memory (DADN-BiLSTM) for captioning. In this research, initially the images and captions are collected from captioning dataset to smooth away small structures. After that, a double attention mechanism is applied to DenseNet for capturing weak features and to improve the problem corresponds between image feature and captioning information. At the same time, a clustering-based segmentation is more useful and easier to segment the image as smaller parts to make the access easily. Moreover, a decoder is used to improve the use of captioning context information. Then the proposed system is implemented in PYTHON and the performance is evaluated against existing methods in terms of some relevant evaluation metrics such as, recall-oriented understudy for gisting evaluation, accuracy and bilingual evaluation understudy. Finally, the experimental results achieve higher scores in all evaluation indicators such as 0.8925 BLEU1, 0.8514 BLEU2, 0.8252 BLEU3, 0.8312 BLEU4 and 0.8611 ROUGE score on UCM captions, 0.8532 BLEU1, 0.7912 BLEU2, 0.8351 BLEU3, 0.7215 BLEU4 and 0.8139 ROUGE score on Sydney captions and 0.8125 BLEU1, 0.7501 BLEU2, 0.6812 BLEU3, 0.7254 BLEU4 and 0.8245 ROUGE score on RSICD captions.

show abstract

Using Neural Encoder-Decoder Models With Continuous Outputs for Remote Sensing Image Captioning

Cited by 17 publications

References 36 publications

RS-LLaVA: A Large Vision-Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery

RS-LLaVA: A Large Vision-Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery

Symbolic and subsymbolicGeoAI: Geospatial knowledge graphs and spatially explicit machine learning

Deep Attention Based Dense Net with Visual Switch Added BiLSTM for Caption Generation from Remote Sensing Images

Contact Info

Product

Resources

About