“…Further, external APIs, such as Google (Wu et al, 2021;Luo et al, 2021), Microsoft (Yang et al, 2021), and OCR (Luo et al, 2021;Wu et al, 2021) are used to enrich the associated knowledge. Finally, pre-trained transformerbased language models (Yang et al, 2021) or multimodal models (Wu et al, 2021;Luo et al, 2021;Wu et al, 2021;Garderes et al, 2020;Marino et al, 2021) are leveraged as implicit knowledge bases for answer predictions.…”