2022
DOI: 10.1007/978-981-19-0475-2_41
|View full text |Cite
|
Sign up to set email alerts
|

Remote Sensing Image Captioning via Multilevel Attention-Based Visual Question Answering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 21 publications
0
1
0
Order By: Relevance
“…Wang et al [35] combined features from a multi-labeling model with CNN-extracted features to train a captioning model. Murali et al [36] proposed a captioning model that initially undergoes training on VQA as an auxiliary task, followed by leveraging the acquired knowledge to generate more accurate captions. In [37], a multi-label classifier is employed to generate labels from the image, which are then utilized, along with ground truth captions, to train the captioning model.…”
Section: Multi-tasking In Nlp For Rsmentioning
confidence: 99%
“…Wang et al [35] combined features from a multi-labeling model with CNN-extracted features to train a captioning model. Murali et al [36] proposed a captioning model that initially undergoes training on VQA as an auxiliary task, followed by leveraging the acquired knowledge to generate more accurate captions. In [37], a multi-label classifier is employed to generate labels from the image, which are then utilized, along with ground truth captions, to train the captioning model.…”
Section: Multi-tasking In Nlp For Rsmentioning
confidence: 99%
“…( 1) and weight normalization coefficient is calculated in Eq. (2). The edge recovery is an iterative process which is mathematically expressed in Eq.…”
Section: Data Preprocessingmentioning
confidence: 99%
“…Traditional remote sensing jobs frequently focus on low-level semantic data via image synthesis. [2]. A word level label is given to an RSI by image classification to exchange the low-level semantic information.…”
Section: Introductionmentioning
confidence: 99%
“…The two-phase VQA model was proposed in Ref. 39, where three levels of attention are used in the first phase to identify the important words, which is then used in second phase to enhance the captions. The generated captions are informative enough as they include all the image objects and their count.…”
Section: Visual Question Answeringmentioning
confidence: 99%