On Measuring Social Biases in Sentence Encoders

May, Chandler; Wang, Alex; Bordia, Shikha; Bowman, Samuel R.; Rudinger, Rachel

doi:10.48550/arxiv.1903.10561

Cited by 35 publications

(57 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Safety and safety of dialog models: Inappropriate and unsafe risks and behaviors of language models have been extensively discussed and studied in previous works (e.g., [53,54]). Issues encountered include toxicity (e.g., [55,56,57]), bias (e.g., [58,59,60,61,62,63,64,65,66,67,68,69,70,71,72]), and inappropriately revealing personally identifying information (PII) from training data [73]. Weidinger et al [54] identify 21 risks associated with large-scale language models and discuss the points of origin for these risks.…”

Section: Related Workmentioning

confidence: 99%

LaMDA: Language Models for Dialog Applications

Thoppilan¹,

Freitas²,

Hall³

et al. 2022

Preprint

288

270

View full text Add to dashboard Cite

We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformerbased neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on 1.56T words of public dialog data and web text. While model scaling alone can improve quality, it shows less improvements on safety and factual grounding. We demonstrate that fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements towards the two key challenges of safety and factual grounding. The first challenge, safety, involves ensuring that the model's responses are consistent with a set of human values, such as preventing harmful suggestions and unfair bias. We quantify safety using a metric based on an illustrative set of human values, and we find that filtering candidate responses using a LaMDA classifier fine-tuned with a small amount of crowdworker-annotated data offers a promising approach to improving model safety. The second challenge, factual grounding, involves enabling the model to consult external knowledge sources, such as an information retrieval system, a language translator, and a calculator. We quantify factuality using a groundedness metric, and we find that our approach enables the model to generate responses grounded in known sources, rather than responses that merely sound plausible. Finally, we explore the use of LaMDA in the domains of education and content recommendations, and analyze their helpfulness and role consistency. * Work done while at Google.

show abstract

Section: Related Workmentioning

confidence: 99%

LaMDA: Language Models for Dialog Applications

Thoppilan¹,

Freitas²,

Hall³

et al. 2022

Preprint

288

270

View full text Add to dashboard Cite

show abstract

“…Here, we investigate how the sensitivity varies across the range of toxicity scores, which gives important clues about the desirability of biases. Instead of using naturally occurring sentences (like (Prabhakaran et al, 2019)) which would intro-duce numerous unknown correlational effects, we followed Hutchinson et al (2020) and May et al (2019) by building a set of 33 template sentences (see Appendix 1) that correspond to a range of toxicity scores. We list a few examples of the templates below:…”

Section: Methodsmentioning

confidence: 99%

Detecting Cross-Geographic Biases in Toxicity Modeling on Social Media

Ghosh

Baker

Jurgens

et al. 2021

Proceedings of the Seventh Workshop on Noisy User-Generated Text (W-Nut 2021)

View full text Add to dashboard Cite

Online social media platforms increasingly rely on Natural Language Processing (NLP) techniques to detect abusive content at scale in order to mitigate the harms it causes to their users. However, these techniques suffer from various sampling and association biases present in training data, often resulting in sub-par performance on content relevant to marginalized groups, potentially furthering disproportionate harms towards them. Studies on such biases so far have focused on only a handful of axes of disparities and subgroups that have annotations/lexicons available. Consequently, biases concerning non-Western contexts are largely ignored in the literature. In this paper, we introduce a weakly supervised method to robustly detect lexical biases in broader geocultural contexts. Through a case study on a publicly available toxicity detection model, we demonstrate that our method identifies salient groups of cross-geographic errors, and, in a follow up, demonstrate that these groupings reflect human judgments of offensive and inoffensive language in those geographic contexts. We also conduct analysis of a model trained on a dataset with ground truth labels to better understand these biases, and present preliminary mitigation experiments.

show abstract

“…Racial Bias: In language generation using OpenAI's GTP-2 model, Sheng et al [42] show that there are more negative associations of the black population when conditioning on context related to respect and occupation. Another study adapts the Sentence Encoder Association Test (SEAT) [34] to analyze potential biases encoded in BERT and GPT-2 with respect to gender, race, and the intersectional identities (gender + race). The empirical analysis shows that BERT has the highest proportion of bias on the race and intersectional tests performed among all contextual word models [46].…”

Section: Fairness/bias In Language Modelsmentioning

confidence: 99%

Unintended Bias in Language Model-driven Conversational Recommendation

Shen¹,

Li²,

Bouadjenek³

et al. 2022

Preprint

View full text Add to dashboard Cite

Conversational Recommendation Systems (CRSs) have recently started to leverage pretrained language models (LM) such as BERT for their ability to semantically interpret a wide range of preference statement variations. However, pretrained LMs are well-known to be prone to intrinsic biases in their training data, which may be exacerbated by biases embedded in domain-specific language data (e.g., user reviews) used to fine-tune LMs for CRSs. We study a recently introduced LM-driven recommendation backbone (termed LMRec) of a CRS to investigate how unintended bias -i.e., language variations such as name references or indirect indicators of sexual orientation or location that should not affect recommendations -manifests in significantly shifted price and category distributions of restaurant recommendations. The alarming results we observe strongly indicate that LMRec has learned to reinforce harmful stereotypes through its recommendations. For example, offhand mention of names associated with the black community significantly lowers the price distribution of recommended restaurants, while offhand mentions of common male-associated names lead to an increase in recommended alcohol-serving establishments. These and many related results presented in this work raise a red flag that advances in the language handling capability of LM-driven CRSs do not come without significant challenges related to mitigating unintended bias in future deployed CRS assistants with a potential reach of hundreds of millions of end users.

show abstract

On Measuring Social Biases in Sentence Encoders

Cited by 35 publications

References 0 publications

LaMDA: Language Models for Dialog Applications

LaMDA: Language Models for Dialog Applications

Detecting Cross-Geographic Biases in Toxicity Modeling on Social Media

Unintended Bias in Language Model-driven Conversational Recommendation

Contact Info

Product

Resources

About