2021
DOI: 10.1007/s00500-020-05539-7
|View full text |Cite
|
Sign up to set email alerts
|

Cross-modality co-attention networks for visual question answering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
3

Relationship

1
9

Authors

Journals

citations
Cited by 22 publications
(5 citation statements)
references
References 23 publications
0
5
0
Order By: Relevance
“…The attention mechanism is one of the pivotal aspects of human vision, enabling individuals to selectively concentrate on regions that are more likely to contain objects. Due to the importance of attention mechanisms in human vision, researchers have conducted extensive studies recently, attempting to leverage attention mechanisms to augment the capabilities of computer vision [39][40][41]. Based on how networks handle different types of information during modeling, attention mechanisms in current computer vision can be broadly categorized into three types: spatial domain, channel domain, and hybrid domain.…”
Section: Attention Mechanismmentioning
confidence: 99%
“…The attention mechanism is one of the pivotal aspects of human vision, enabling individuals to selectively concentrate on regions that are more likely to contain objects. Due to the importance of attention mechanisms in human vision, researchers have conducted extensive studies recently, attempting to leverage attention mechanisms to augment the capabilities of computer vision [39][40][41]. Based on how networks handle different types of information during modeling, attention mechanisms in current computer vision can be broadly categorized into three types: spatial domain, channel domain, and hybrid domain.…”
Section: Attention Mechanismmentioning
confidence: 99%
“…Multimodality fusion methods include linear fusion and bilinear pooling. Linear fusion methods include feature connection, element multiplication, and bilinear merging and then use these methods to calculate the outer product [15].…”
Section: Multimodality Fusion Modelmentioning
confidence: 99%
“…That is, the observation can be adjusted to the more informative features according to their relative importance, focusing the algorithm on the most relevant parts of the input, moving from focusing on global features to the focused features, thus saving resources and getting the most effective information quickly. The attention mechanism has arguably become one of the most important concepts in the field of deep learning, since Bahdanau, Cho & Bengio (2015) used attention mechanism for the machine interpretation tasks, various variants of attention mechanism have emerged, such as Co-Attention networks ( Yang et al, 2019a ; Han et al, 2021 ; Yu et al, 2019 ; Liu et al, 2021b ; Lu et al, 2016 ; Sharma & Srivastava, 2022 ), Recurrent Attention networks ( Osman & Samek, 2019 ; Ren & Zemel, 2017 ; Gan et al, 2019 ), Self-Attention networks ( Li et al, 2019 ; Fan et al, 2019 ; Ramachandran et al, 2019 ; Xia et al, 2022 ; Xiang et al, 2022 ; Yan, Silamu & Li, 2022 ), etc. The effectiveness of visual information processing is considerably enhanced by all of these attention mechanisms, which also optimize VQA performance.…”
Section: Introductionmentioning
confidence: 99%