Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions

Rosenberg, Daniel; Gat, Itai; Feder, Amir; Reichart, Roi

doi:10.18653/v1/2021.acl-short.10

Cited by 7 publications

(6 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[Gan and Ng 2019] further point out that the main deficiency in the data augmentation based methods is that the adversarial examples created are unnatural and not expected to be present in real world. [Rosenberg et al 2021] draw similar conclusion that data augmentation based methods cannot address the robustness issues effectively.…”

Section: Related Workmentioning

confidence: 80%

An Understanding-oriented Robust Machine Reading Comprehension Model

Ren

Liu

et al. 2022

ACM Trans. Asian Low-Resour. Lang. Inf. Process.

View full text Add to dashboard Cite

Although existing machine reading comprehension models are making rapid progress on many datasets, they are far from robust. In this paper, we propose an understanding-oriented machine reading comprehension model to address three kinds of robustness issues, which are over sensitivity, over stability and generalization. Specifically, we first use a natural language inference module to help the model understand the accurate semantic meanings of input questions so as to address the issues of over sensitivity and over stability. Then in the machine reading comprehension module, we propose a memory-guided multi-head attention method that can further well understand the semantic meanings of input questions and passages. Third, we propose a multi-language learning mechanism to address the issue of generalization. Finally, these modules are integrated with a multi-task learning based method. We evaluate our model on three benchmark datasets that are designed to measure models’ robustness, including DuReader (robust) and two SQuAD-related datasets. Extensive experiments show that our model can well address the mentioned three kinds of robustness issues. And it achieves much better results than the compared state-of-the-art models on all these datasets under different evaluation metrics, even under some extreme and unfair evaluations. The source code of our work is available at: https://github.com/neukg/RobustMRC.

show abstract

Section: Related Workmentioning

confidence: 80%

An Understanding-oriented Robust Machine Reading Comprehension Model

Ren

Liu

et al. 2022

ACM Trans. Asian Low-Resour. Lang. Inf. Process.

View full text Add to dashboard Cite

show abstract

“…Such a controlled setting is similar to the randomized experiment described in § 2, where it is possible to compute the difference between an actual text and what the text would have been had a specific concept not existed in it. Indeed, in cases where counterfactual texts can be generated, we can often estimate causal effects on textbased models (Ribeiro et al, 2020;Gardner et al, 2020;Rosenberg et al, 2021;Ross et al, 2021;Meng et al, 2022;Zhang et al, 2022). However, generating such counterfactuals is challenging (see § 4.1.1).…”

Section: Causal Model Interpretationsmentioning

confidence: 99%

Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

Feder

Keith

Manzoor

et al. 2022

Transactions of the Association for Computational Linguistics

View full text Add to dashboard Cite

A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the convergence of causal inference and language processing. Still, research on causality in NLP remains scattered across domains without unified definitions, benchmark datasets and clear articulations of the challenges and opportunities in the application of causal inference to the textual domain, with its unique properties. In this survey, we consolidate research across academic areas and situate it in the broader NLP landscape. We introduce the statistical challenge of estimating causal effects with text, encompassing settings where text is used as an outcome, treatment, or to address confounding. In addition, we explore potential uses of causal inference to improve the robustness, fairness, and interpretability of NLP models. We thus provide a unified overview of causal inference for the NLP community.1

show abstract

“…However, it is non-trivial to remove biases from this type of data. Indeed, it was reported that the question solely is sufficient to detect the correct answer [18,19], i.e., no image information is required. In an attempt to fix this bias, the dataset was re-annotated [20], or the train and test split were reorganized [21].…”

Section: Related Workmentioning

confidence: 99%

Perceptual Score: What Data Modalities Does Your Model Perceive?

Gat¹,

Schwing²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Machine learning advances in the last decade have relied significantly on largescale datasets that continue to grow in size. Increasingly, those datasets also contain different data modalities. However, large multi-modal datasets are hard to annotate, and annotations may contain biases that we are often unaware of. Deep-net-based classifiers, in turn, are prone to exploit those biases and to find shortcuts. To study and quantify this concern, we introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features, i.e., modalities. Using the perceptual score, we find a surprisingly consistent trend across four popular datasets: recent, more accurate state-of-the-art multi-modal models for visual question-answering or visual dialog tend to perceive the visual data less than their predecessors. This trend is concerning as answers are hence increasingly inferred from textual cues only. Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions. We hope to spur a discussion on the perceptiveness of multi-modal models and also hope to encourage the community working on multi-modal classifiers to start quantifying perceptiveness via the proposed perceptual score.Reported improvements are to a large extent due to the availability of large datasets [1-3], computational performance advances, e.g., for GPUs, and a better understanding about how to encode inductive biases into deep-nets, e.g., by using rectified linear units [4], normalization [5], skip connections [6], transformers [7], etc. However, importantly, developed deep-net architectures are not guaranteed to solve a given task. There is a chance that they may instead exploit dataset biases. This concern is surely in part due to non-robust training techniques, and a plethora of methods improve classifier robustness [8][9][10]. However, datasets play an important role in controlling the extracted bias as well. For instance, if correct answers in a question-answering task are significantly shorter than incorrect ones, classifier training should not use answer length as a cue. Although this seems reasonable, for audio-visual scene aware dialog, Schwartz et al. [11] find for example that in many cases the question alone is sufficient to generate a scene-aware dialog response, avoiding the need to look at the video. Hence, in order to assess the suitability of a classifier, we need to understand how much it relies on different data modalities.To quantify how much a classifier relies on its different input modalities, we introduce the perceptual score. The perceptual score assesses the degree to which a model relies on a modality. To do so the perceptual score permutes the features of a modality across samples in the test set after the classifier 35th Conference on Neural Information Processing Systems (NeurIPS 2021),

show abstract

Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions

Cited by 7 publications

References 26 publications

An Understanding-oriented Robust Machine Reading Comprehension Model

An Understanding-oriented Robust Machine Reading Comprehension Model

Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

Perceptual Score: What Data Modalities Does Your Model Perceive?

Contact Info

Product

Resources

About