Towards Debiasing NLU Models from Unknown Biases

Utama, Prasetya Ajie; Moosavi, Nafise Sadat; Gurevych, Iryna

doi:10.18653/v1/2020.emnlp-main.613

Cited by 58 publications

(30 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…From a robustness point of view, such pretrain-and-fine-tune pipelines are known to be prone to biases that are present in data (Gururangan et al, 2018;Poliak et al, 2018;Mc-Coy et al, 2019;Schuster et al, 2019). Various methods were proposed to mitigate such biases in a form of robust training, where a bias model is trained to capture the bias and then used to relax the predictions of a main model, so that it can focus less on biased examples and more on the "hard", more challenging examples (Clark et al, 2019;Mahabadi et al, 2020;Utama et al, 2020b; Figure 1: Amount of subsequence bias extracted from different language models vs. the robustness of models to the bias. Robustness is measured as improvement of the model on out-of-distribution examples, while extractability is measured as the improvement of the probe's ability to extract the bias from a debiased model, compared to the baseline.…”

Section: Introductionmentioning

confidence: 99%

Debiasing Methods in Natural Language Understanding Make Bias More Accessible

Mendelson¹,

Belinkov²

2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Model robustness to bias is often determined by the generalization on carefully designed out-of-distribution datasets. Recent debiasing methods in natural language understanding (NLU) improve performance on such datasets by pressuring models into making unbiased predictions. An underlying assumption behind such methods is that this also leads to the discovery of more robust features in the model's inner representations. We propose a general probing-based framework that allows for posthoc interpretation of biases in language models, and use an information-theoretic approach to measure the extractability of certain biases from the model's representations. We experiment with several NLU datasets and known biases, and show that, counter-intuitively, the more a language model is pushed towards a debiased regime, the more bias is actually encoded in its inner representations. 1 * Supported by the Viterbi Fellowship in the Center for Computer Engineering at the Technion.

show abstract

Section: Introductionmentioning

confidence: 99%

Debiasing Methods in Natural Language Understanding Make Bias More Accessible

Mendelson¹,

Belinkov²

2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…This suggests that naively applying debiasing techniques may incur unexpected negative impacts on other aspects of the moderation system. Further research is needed into modeling approaches that can achieve robust performance both in prediction and in uncertainty calibration under data bias and distributional shift (Nam et al, 2020;Utama et al, 2020;Du et al, 2021;Yaghoobzadeh et al, 2021;Bao et al, 2021;Karimi Mahabadi et al, 2020).…”

Section: Discussionmentioning

confidence: 99%

Measuring and Improving Model-Moderator Collaboration using Uncertainty Estimation

Kivlichan¹,

Lin²,

Liu³

et al. 2021

Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)

View full text Add to dashboard Cite

Content moderation is often performed by a collaboration between humans and machine learning models. However, it is not well understood how to design the collaborative process so as to maximize the combined moderator-model system performance. This work presents a rigorous study of this problem, focusing on an approach that incorporates model uncertainty into the collaborative process. First, we introduce principled metrics to describe the performance of the collaborative system under capacity constraints on the human moderator, quantifying how efficiently the combined system utilizes human decisions. Using these metrics, we conduct a large benchmark study evaluating the performance of state-of-the-art uncertainty models under different collaborative review strategies. We find that an uncertainty-based strategy consistently outperforms the widely used strategy based on toxicity scores, and moreover that the choice of review strategy drastically changes the overall system performance. Our results demonstrate the importance of rigorous metrics for understanding and developing effective moderator-model systems for content moderation, as well as the utility of uncertainty estimation in this domain. 1 * Equal contribution; authors listed alphabetically. † This work was done while Zi Lin was an AI resident at Google Research.

show abstract

“…Several studies have reported successful generalization from MNLI to HANS. Among data-based strategies, it has been achieved via augmenting MNLI data with predicate-argument structures (Moosavi et al, 2020) and syntactic transformations (Min et al, 2020). Although there are many reports of syntactic knowledge in pre-trained BERT (Rogers et al, 2020b), Min et al (2020) suggest that pre-training does not yield a strong inductive bias to use syntax in downstream tasks, and augmentation "nudges" the model towards that.…”

Section: Related Workmentioning

confidence: 99%

Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics

Bhargava¹,

Drozd²,

Rogers³

2021

Proceedings of the Second Workshop on Insights From Negative Results in NLP

View full text Add to dashboard Cite

Much of recent progress in NLU was shown to be due to models' learning dataset-specific heuristics. We conduct a case study of generalization in NLI (from MNLI to the adversarially constructed HANS dataset) in a range of BERT-based architectures (adapters, Siamese Transformers, HEX debiasing), as well as with subsampling the data and increasing the model size. We report 2 successful and 3 unsuccessful strategies, all providing insights into how Transformer-based models learn to generalize.

show abstract

Towards Debiasing NLU Models from Unknown Biases

Cited by 58 publications

References 39 publications

Debiasing Methods in Natural Language Understanding Make Bias More Accessible

Debiasing Methods in Natural Language Understanding Make Bias More Accessible

Measuring and Improving Model-Moderator Collaboration using Uncertainty Estimation

Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics

Contact Info

Product

Resources

About