Findings of the Association for Computational Linguistics: ACL 2022 2022
DOI: 10.18653/v1/2022.findings-acl.165
|View full text |Cite
|
Sign up to set email alerts
|

BBQ: A hand-built bias benchmark for question answering

Abstract: It is well documented that NLP models learn social biases, but little work has been done on how these biases manifest in model outputs for applied tasks like question answering (QA). We introduce the Bias Benchmark for QA (BBQ), a dataset of question sets constructed by the authors that highlight attested social biases against people belonging to protected classes along nine social dimensions relevant for U.S. English-speaking contexts. Our task evaluates model responses at two levels: (i) given an under-infor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
35
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 33 publications
(38 citation statements)
references
References 29 publications
3
35
0
Order By: Relevance
“…Researchers in NLP have an ethical obligation to inform (and if necessary, pressure) stakeholders about how to avoid or mitigate the negative impacts while realizing the positive ones. Most prominently, typical applied NLP models show serious biases with respect to legally protected attributes like race and gender (Bolukbasi et al, 2016;Rudinger et al, 2018;Parrish et al, 2021). We have no reliable mechanisms to mitigate these biases and no reason to believe that they will be satisfactorily resolved with larger scale.…”
Section: Present-day Impact Mitigationmentioning
confidence: 95%
“…Researchers in NLP have an ethical obligation to inform (and if necessary, pressure) stakeholders about how to avoid or mitigate the negative impacts while realizing the positive ones. Most prominently, typical applied NLP models show serious biases with respect to legally protected attributes like race and gender (Bolukbasi et al, 2016;Rudinger et al, 2018;Parrish et al, 2021). We have no reliable mechanisms to mitigate these biases and no reason to believe that they will be satisfactorily resolved with larger scale.…”
Section: Present-day Impact Mitigationmentioning
confidence: 95%
“…Dataset We now explore additional social dimensions using BBQ (Parrish et al, 2022), which tests social biases against people from nine protected classes (age, disability status, gender identity, nationality, physical appearance, race, religion, socio-economic status, sexual orientation). BBQ examples are in sets of four multiple-choice questions.…”
Section: Broader Social Dimensions: Bbqmentioning
confidence: 99%
“…Language models producing toxic or biased content can cause severe harm especially to the groups being biased against (Bender et al, 2021). A series of benchmarks have been developed to show that LLMs can generate toxic outputs (Gehman et al, 2020), contain gender biases Zhao et al, 2018) and other categories of social biases (Nangia et al, 2020;Nadeem et al, 2021;Parrish et al, 2022), perform poorly against minority demographic groups ( Koh et al, 2021;Harris et al, 2022) or dialectical variations (Ziems et al, 2022;Tan et al, 2020). Ideally, LLMs should not exhibit biased behaviors and not discriminate against any group.…”
Section: Appendix a More Related Workmentioning
confidence: 99%
“…For this reason, we instead turn to the more recently introduced BBQ dataset of Parrish et al (2022). We note that the BBQ dataset may still suffer from some of the concerns discussed by Blodgett et al ( 2021), but we expect it is comparatively better than the other options.…”
Section: Biasmentioning
confidence: 99%