BERT based Hierarchical Alternating Co-Attention Visual Question Answering using Bottom-Up Features

doi:10.17762/ijisae.v10i3s.2427

Search citation statements

Order By: Relevance

Paper Sections

Select...

Methods Paper1

Gru1

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2023

Publication Types

Select...

Article1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

(2 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most of the recent work uses transformers for question featurization. [25,26] utilized transformer BERT for language feature extraction. In [87] authors concatenated the output of four consecutive BERT layers in order to generate hierarchical features from the question.…”

Section: Methods Papermentioning

confidence: 99%

See 1 more Smart Citation

A Comprehensive Review and Open Challenges on Visual Question Answering Models

Shaik,

Koshti,

Gupta

et al. 2023

AiBi Revista de Investigación, Administración e Ingeniería

View full text Add to dashboard Cite

Users are now able to actively interact with images and pose different questions based on images, thanks to recent developments in artificial intelligence. In turn, a response in a natural language answer is expected. The study discusses a variety of datasets that can be used to examine applications for visual question-answering (VQA), as well as their advantages and disadvantages. Four different forms of VQA models—simple joint embedding-based models, attention-based models, knowledge-incorporated models, and domain-specific VQA models—are in-depth examined in this article. We also critically assess the drawbacks and future possibilities of all current state-of-the-art (SoTa), end-to-end VQA models. Finally, we present the directions and guidelines for further development of the VQA models.

show abstract

Section: Methods Papermentioning

confidence: 99%

“…VilBERT [13],LXMERT [14], UNITER [15], Oscar [16], Coarse to fine reasoning [23], MPC [25], Hie-Alternation coattention [26], Rich Image region VQA [27], KRISP [50] Source: Own elaboration.…”

Section: Grumentioning

confidence: 99%