“…Recent approaches have shown a great potential to incorporate external knowledge for knowledgebased VQA. Several methods explore aggregating the external knowledge either in the form of structured knowledge graphs (Garderes et al, 2020;Narasimhan et al, 2018;Li et al, 2020b;Wang et al, 2017a,b), unstructured knowledge bases (Marino et al, 2021;Wu et al, 2022;Luo et al, 2021), and neural-symbolic inference based knowledge (Chen et al, 2020;West et al, 2021). In these methods, object detectors (Ren et al, 2015) and scene classifiers (He et al, 2016) are used to associate images with external knowledge.…”