Moin Nadeem scite author profile

A stereotype is an over-generalized belief about a particular group of people, e.g., Asians are good at math or African Americans are athletic. Such beliefs (biases) are known to hurt target groups. Since pretrained language models are trained on large real world data, they are known to capture stereotypical biases. It is important to quantify to what extent these biases are present in them. Although this is a rapidly growing area of research, existing literature lacks in two important aspects: 1) they mainly evaluate bias of pretrained language models on a small set of artificial sentences, even though these models are trained on natural data; 2) current evaluations focus on measuring bias without considering the language modeling ability of a model, which could lead to misleading trust on a model even if it is a poor language model. We address both these problems. We present StereoSet, a large-scale natural English dataset to measure stereotypical biases in four domains: gender, profession, race, and religion. We contrast both stereotypical bias and language modeling ability of popular models like BERT, GPT2, ROBERTA, and XLNET. We show that these models exhibit strong stereotypical biases. Our data and code are available at https://stereoset. mit.edu.

show abstract

StereoSet: Measuring stereotypical bias in pretrained language models

Nadeem¹,

Bethke²,

Reddy³

2020

Preprint

View full text Add to dashboard Cite

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

Gehrmann¹,

Adewumi²,

Aggarwal³

et al. 2021

View full text Add to dashboard Cite

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with wellestablished, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for which we are organizing a shared task at our ACL 2021 Workshop and to which we invite the entire NLG community to participate.

show abstract

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

Gehrmann¹,

Adewumi²,

Aggarwal³

et al. 2021

Preprint

View full text Add to dashboard Cite

Untitled

Nadeem¹,

Fang

Xu³

et al. 2019

View full text Add to dashboard Cite

We present FAKTA which is a unified framework that integrates various components of a fact checking process: document retrieval from media sources with various types of reliability, stance detection of documents with respect to given claims, evidence extraction, and linguistic analysis. FAKTA predicts the factuality of given claims and provides evidence at the document and sentence level to explain its predictions.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Moin Nadeem

StereoSet: Measuring stereotypical bias in pretrained language models

StereoSet: Measuring stereotypical bias in pretrained language models

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

Untitled

Contact Info

Product

Resources

About