“…VGGNet LSTM+Q+I [1], AVWAN [2],SAN [6],Facts-VQA [70],DPP [75], QAM [76], Region-Sel [77], NMN [78], ResNet FDA [5], Bayesian [7], Dense-Sym [8], Code-Mix VQA [9], Hei-Co-atten [10], Rich-img-Region [27], MCB [29], MRN [30] , FVTA [33], MUTAN [36], Meta-VQA [77],Rich-VQA [79], QTA [80], , DCN [81], GoogleNet Neural Image QA [80], Multi-Modal QA [82] , i-Bowing [83], Smem [84] F-RCNN Code-Mixed VQA [9], CAQT [11], QLOB [12], BAN [28] , MFB [32], [85] ,explicit-know-Based [86] , Know-Base Graph [87] BERT VilBERT [13], LXMERT [14] , UNITER [15], Oscar [16], MPC [25], Semantic VLBERT [88] Source: Own elaboration. The next step in the VQA model is to extract question features.…”