2022
DOI: 10.3390/s22062245
|View full text |Cite
|
Sign up to set email alerts
|

COIN: Counterfactual Image Generation for Visual Question Answering Interpretation

Abstract: Due to the significant advancement of Natural Language Processing and Computer Vision-based models, Visual Question Answering (VQA) systems are becoming more intelligent and advanced. However, they are still error-prone when dealing with relatively complex questions. Therefore, it is important to understand the behaviour of the VQA models before adopting their results. In this paper, we introduce an interpretability approach for VQA models by generating counterfactual images. Specifically, the generated image … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 50 publications
0
2
0
Order By: Relevance
“…However, today's VQA models still suffer from severe language biases (Agrawal et al, 2018), over-relying on linguistic correlations rather than multi-modal reasoning. To realize robust VQA, recent works (Chen et al, 2020(Chen et al, , 2023Kolling et al, 2022;Agarwal et al, 2020;Gokhale et al, 2020a,b;Boukhers et al, 2022;Tang et al, 2020;Kant et al, 2021;Bitton et al, 2021;Askarian et al, 2022;Wang et al, 2021b) employ various data augmentation (DA) techniques by generating extra training samples, to enhance VQA models' performance on both in-domain (ID) (Goyal et al, 2017) and out-ofdistribution (OOD) datasets (Agrawal et al, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…However, today's VQA models still suffer from severe language biases (Agrawal et al, 2018), over-relying on linguistic correlations rather than multi-modal reasoning. To realize robust VQA, recent works (Chen et al, 2020(Chen et al, , 2023Kolling et al, 2022;Agarwal et al, 2020;Gokhale et al, 2020a,b;Boukhers et al, 2022;Tang et al, 2020;Kant et al, 2021;Bitton et al, 2021;Askarian et al, 2022;Wang et al, 2021b) employ various data augmentation (DA) techniques by generating extra training samples, to enhance VQA models' performance on both in-domain (ID) (Goyal et al, 2017) and out-ofdistribution (OOD) datasets (Agrawal et al, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…For example, blindly answering "2" for all counting questions or "tennis" for all sport-related questions can still get a satisfactory performance. To mitigate these bias issues and realize robust VQA, a recent surge of VQA work [16,18,33,2,22,9,48,30,29,23,8,7,53] resort to different data augmentation techniques (i.e., generating extra training samples beyond original training set), and achieve good performance on both the in-domain (ID) (e.g., VQA v2 [24]) and out-of-distribution (OOD) datasets (e.g., VQA-CP [4]). Currently, mainstream Data Augmentation (DA) strategies for robust VQA are synthetic-based methods.…”
Section: Introductionmentioning
confidence: 99%