2021
DOI: 10.48550/arxiv.2109.01934
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Weakly Supervised Relative Spatial Reasoning for Visual Question Answering

Abstract: Vision-and-language (V&L) reasoning necessitates perception of visual concepts such as objects and actions, understanding semantics and language grounding, and reasoning about the interplay between the two modalities. One crucial aspect of visual reasoning is spatial understanding, which involves understanding relative locations of objects, i.e. implicitly learning the geometry of the scene. In this work, we evaluate the faithfulness of V&L models to such geometric understanding, by formulating the prediction … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 39 publications
0
1
0
Order By: Relevance
“…It enhances accuracy by utilizing a CNN that predicts bounding boxes for objects in an image. https://www.indjst.org/ (54) CPDR (55) , MulFA/UFSCAN (56) , Bilinear Graph (57) , AttReg (58) , AMAM (16) , Scene-text using PHOC (59) , MGRF (60) , Bottom-Up and Top-Down (61) , DCAMN (39) , Skill Concept (62) , PGM (63) , SR-OCE (64) , RAMEN (65) , CSST (66) , Coarse-to-Fine (67) , GMA (68) , BLOCK (69) , CapsAtt (32,40) , Re-attention (70) , CRN (71) , CAT (11) , shortcut (72) , DAQC (15) , MGFAN (73) , MMMH (19) , MSG (74) , Fair-VQA (75) , Attention map (5) , SAVQA (76) , MGAVQA (77) , MuKEA (78) , ACVRM (79) , QD-GFN (23) , Swap-Mix (80) , CVA (17) , HGNMN (26) , SUPER (37) , Uncertainty based (81) , CLG (82) , WSQG (83) , VLR (84) , LXMERT (85) , SceneGATE (86)…”
Section: Visual Feature Extraction Techniquesmentioning
confidence: 99%
“…It enhances accuracy by utilizing a CNN that predicts bounding boxes for objects in an image. https://www.indjst.org/ (54) CPDR (55) , MulFA/UFSCAN (56) , Bilinear Graph (57) , AttReg (58) , AMAM (16) , Scene-text using PHOC (59) , MGRF (60) , Bottom-Up and Top-Down (61) , DCAMN (39) , Skill Concept (62) , PGM (63) , SR-OCE (64) , RAMEN (65) , CSST (66) , Coarse-to-Fine (67) , GMA (68) , BLOCK (69) , CapsAtt (32,40) , Re-attention (70) , CRN (71) , CAT (11) , shortcut (72) , DAQC (15) , MGFAN (73) , MMMH (19) , MSG (74) , Fair-VQA (75) , Attention map (5) , SAVQA (76) , MGAVQA (77) , MuKEA (78) , ACVRM (79) , QD-GFN (23) , Swap-Mix (80) , CVA (17) , HGNMN (26) , SUPER (37) , Uncertainty based (81) , CLG (82) , WSQG (83) , VLR (84) , LXMERT (85) , SceneGATE (86)…”
Section: Visual Feature Extraction Techniquesmentioning
confidence: 99%