Counterfactual VQA: A Cause-Effect Look at Language Bias

Niu, Yulei; Tang, Kaihua; Zhang, Hanwang; Lu, Zhiwu; Hua, Xian-Sheng; Wen, Ji-Rong

doi:10.1109/cvpr46437.2021.01251

Cited by 294 publications

(143 citation statements)

References 33 publications

Supporting

Mentioning

143

Contrasting

Order By: Relevance

“…VQA-CP [3], drawn from VQA v2 dataset [20], is the first benchmark proposed to evaluate (and reduce) questionoriented language bias in VQA models. Considerable effort [3,29,33,48,53,1] has been invested on VQA-CP along 3 dimensions: (i) compensating for question-answer distribution patterns through a regularizer based on an auxiliary model [48,8,14,67,21,29]; (ii) taking advantage of additional supervision from human-generated attention maps [53,72,17]; and (iii) synthesizing counterfactual examples to augment training set [1,10,66]. Recent work [68] shows that simple methods such as generating answers at random can already surpass state of the art on some question types.…”

Section: Robust Vqa Benchmarksmentioning

confidence: 99%

A Closer Look at the Robustness of Vision-and-Language Pre-trained Models

Li,

Gan,

Liu

2020

Preprint

View full text Add to dashboard Cite

Large-scale pre-trained multimodal transformers, such as ViLBERT and UNITER, have propelled the state of the art in vision-and-language (V+L) research to a new level. Although achieving impressive performance on standard tasks, to date, it still remains unclear how robust these pretrained models are. To investigate, we conduct a host of thorough evaluations on existing pre-trained models over 4 different types of V+L specific model robustness: (i) Linguistic Variation; (ii) Logical Reasoning; (iii) Visual Content Manipulation; and (iv) Answer Distribution Shift. Interestingly, by standard model finetuning, pre-trained V+L models already exhibit better robustness than many task-specific state-of-the-art methods. To further enhance model robustness, we propose MANGO, a generic and efficient approach that learns a Multimodal Adversarial Noise GeneratOr in the embedding space to fool pre-trained V+L models. Differing from previous studies focused on one specific type of robustness, MANGO is task-agnostic, and enables universal performance lift for pre-trained models over diverse tasks designed to evaluate broad aspects of robustness. Comprehensive experiments demonstrate that MANGO achieves new state of the art on 7 out of 9 robustness benchmarks, surpassing existing methods by a significant margin. As the first comprehensive study on V+L robustness, this work puts robustness of pre-trained models into sharper focus, pointing new directions for future study.

show abstract

Section: Robust Vqa Benchmarksmentioning

confidence: 99%

A Closer Look at the Robustness of Vision-and-Language Pre-trained Models

Li,

Gan,

Liu

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Causal Recommendation. Causal inference has been widely used in many machine learning applications, spanning from computer vision [23,34], natural language processing [11,12,43], to information retrieval [4]. In recommendation, most works on causal inference [25] focus on debiasing various biases in user feedback, including position bias [18], clickbait issue [37], and popularity bias [45].…”

Section: Related Workmentioning

confidence: 99%

Deconfounded Recommendation for Alleviating Bias Amplification

Wang

Feng

et al. 2021

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &Amp; Data Mining

124

View full text Add to dashboard Cite

Recommender systems usually amplify the biases in the data. The model learned from historical interactions with imbalanced item distribution will amplify the imbalance by over-recommending items from the majority groups. Addressing this issue is essential for a healthy ecosystem of recommendation in the long run. Existing work applies bias control to the ranking targets (e.g., calibration, fairness, and diversity), but ignores the true reason for bias amplification and trades off the recommendation accuracy.In this work, we scrutinize the cause-effect factors for bias amplification, identifying the main reason lies in the confounding effect of imbalanced item distribution on user representation and prediction score. The existence of such confounder pushes us to go beyond merely modeling the conditional probability and embrace the causal modeling for recommendation. Towards this end, we propose a Deconfounded Recommender System (DecRS), which models the causal effect of user representation on the prediction score. The key to eliminating the impact of the confounder lies in backdoor adjustment, which is however difficult to do due to the infinite sample space of the confounder. For this challenge, we contribute an approximation operator for backdoor adjustment which can be easily plugged into most recommender models. Lastly, we devise an inference strategy to dynamically regulate backdoor adjustment according to user status. We instantiate DecRS on two representative models FM [32] and NFM [16], and conduct extensive experiments over two benchmarks to validate the superiority of our proposed DecRS.

show abstract

“…Counterfactual inference. A line of research attempts to enable deep neural networks with counterfactual thinking by incorporating counterfactual inference (Yue et al, 2021;Wang et al, 2021;Niu et al, 2021;Tang et al, 2020;Feng et al, 2021). These methods perform counterfactual inference over the model predictions according to a pre-defined causal graph.…”

Section: Related Workmentioning

confidence: 99%

“…Debiased training (Tu et al, 2020;Utama et al, 2020) eliminates the spurious correlation or bias in training data to enhance the generalization ability and deal with out-of-distribution samples. In addition to the training phase, a few inference techniques might improve the model performance on hard samples, including posterior regularization (Srivastava et al, 2018) and causal inference (Yu et al, 2020;Niu et al, 2021). However, both techniques require domain knowledge such as prior or causal graph tailored for specific applications.…”

Section: Related Workmentioning

confidence: 99%

Empowering Language Understanding with Counterfactual Reasoning

Feng

Zhang²,

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Present language understanding methods have demonstrated extraordinary ability of recognizing patterns in texts via machine learning. However, existing methods indiscriminately use the recognized patterns in the testing phase that is inherently different from us humans who have counterfactual thinking, e.g., to scrutinize for the hard testing samples. Inspired by this, we propose a Counterfactual Reasoning Model, which mimics the counterfactual thinking by learning from few counterfactual samples. In particular, we devise a generation module to generate representative counterfactual samples for each factual sample, and a retrospective module to retrospect the model prediction by comparing the counterfactual and factual samples. Extensive experiments on sentiment analysis (SA) and natural language inference (NLI) validate the effectiveness of our method.

show abstract

Counterfactual VQA: A Cause-Effect Look at Language Bias

Cited by 294 publications

References 33 publications

A Closer Look at the Robustness of Vision-and-Language Pre-trained Models

A Closer Look at the Robustness of Vision-and-Language Pre-trained Models

Deconfounded Recommendation for Alleviating Bias Amplification

Empowering Language Understanding with Counterfactual Reasoning

Contact Info

Product

Resources

About