Findings of the Association for Computational Linguistics: ACL 2022 2022
DOI: 10.18653/v1/2022.findings-acl.176
|View full text |Cite
|
Sign up to set email alerts
|

Your fairness may vary: Pretrained language model fairness in toxic text classification

Abstract: The popularity of pretrained language models in natural language processing systems calls for a careful evaluation of such models in down-stream tasks, which have a higher potential for societal impact. The evaluation of such systems usually focuses on accuracy measures. Our findings in this paper call for attention to be paid to fairness measures as well. Through the analysis of more than a dozen pretrained language models of varying sizes on two toxic text classification tasks (English), we demonstrate that … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
13
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 20 publications
(14 citation statements)
references
References 20 publications
1
13
0
Order By: Relevance
“…Webster et al (2020) find that existing pretrained models encode different degrees of gender correlations, despite their performance on target tasks being quite similar, motivating the need to consider different metrics when performing model selection. A similar effect is also observed by Baldini et al (2022). Chalkidis et al (2022) examine the effectiveness of debiasing methods over a multi-lingual benchmark dataset consisting of four subsets of legal documents, covering five languages and various sensitive attributes.…”
Section: Effectiveness Of Debiasing Methodssupporting
confidence: 61%
See 1 more Smart Citation
“…Webster et al (2020) find that existing pretrained models encode different degrees of gender correlations, despite their performance on target tasks being quite similar, motivating the need to consider different metrics when performing model selection. A similar effect is also observed by Baldini et al (2022). Chalkidis et al (2022) examine the effectiveness of debiasing methods over a multi-lingual benchmark dataset consisting of four subsets of legal documents, covering five languages and various sensitive attributes.…”
Section: Effectiveness Of Debiasing Methodssupporting
confidence: 61%
“…Beyond the standard definitions of fairness, a number of studies have examined the effectiveness of various debiasing methods in additional settings (Gonen and Goldberg, 2019;Meade et al, 2021;Lamba et al, 2021;Baldini et al, 2022;Chalkidis et al, 2022). For example, Meade et al (2021) not only examine the effectiveness of various debiasing methods but also measure the impact of debiasing methods on a model's language modeling ability and downstream task performance.…”
Section: Effectiveness Of Debiasing Methodsmentioning
confidence: 99%
“…Unintended social biases in NLP models have been identified in word/sentence embedding (Bolukbasi et al, 2016;May et al, 2019) and applications such as coreference resolution (Zhao et al, 2018), language modeling (Bordia and Bowman, 2019b), machine translation (Bordia and Bowman, 2019a), and text classification Ball-Burack et al (2021); Baldini et al (2022).…”
Section: Related Workmentioning
confidence: 99%
“…(3) The system should better solve the problem of news recommendation for new users and the problem of user cold start (4) The newly added news also needs to be quickly recommended to users to solve the long tail effect in news information and better solve the cold start problem of the project (5) The system tracks the accuracy and recall rate of the recommendation system [15], ensures that the news recommendation has a high accuracy and recall rate, and makes users satisfied with the current recommendation (6) The system should be able to adapt to a large number of users and new news on the platform and can quickly calculate the push, save, and proofread type (7) The system has good expansibility and meets the system's ability to deal with SeaView data in terms of capacity and computing power (8) The real-time computing H-frame is used to calculate the personalized recommendation model, so as to improve the actual push and storage capacity of the whole platform…”
Section: Introductionmentioning
confidence: 99%