“…That is, the observation can be adjusted to the more informative features according to their relative importance, focusing the algorithm on the most relevant parts of the input, moving from focusing on global features to the focused features, thus saving resources and getting the most effective information quickly. The attention mechanism has arguably become one of the most important concepts in the field of deep learning, since Bahdanau, Cho & Bengio (2015) used attention mechanism for the machine interpretation tasks, various variants of attention mechanism have emerged, such as Co-Attention networks ( Yang et al, 2019a ; Han et al, 2021 ; Yu et al, 2019 ; Liu et al, 2021b ; Lu et al, 2016 ; Sharma & Srivastava, 2022 ), Recurrent Attention networks ( Osman & Samek, 2019 ; Ren & Zemel, 2017 ; Gan et al, 2019 ), Self-Attention networks ( Li et al, 2019 ; Fan et al, 2019 ; Ramachandran et al, 2019 ; Xia et al, 2022 ; Xiang et al, 2022 ; Yan, Silamu & Li, 2022 ), etc. The effectiveness of visual information processing is considerably enhanced by all of these attention mechanisms, which also optimize VQA performance.…”