2021
DOI: 10.1111/lnc3.12432
|View full text |Cite
|
Sign up to set email alerts
|

Five sources of bias in natural language processing

Abstract: Recently, there has been an increased interest in demographically grounded bias in natural language processing (NLP) applications. Much of the recent work has focused on describing bias and providing an overview of bias in a larger context. Here, we provide a simple, actionable summary of this recent work. We outline five sources where bias can occur in NLP systems: (1) the data, (2) the annotation process, (3) the input representations, (4) the models, and finally (5) the research design (or how we conceptual… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
110
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 143 publications
(111 citation statements)
references
References 87 publications
0
110
0
1
Order By: Relevance
“…The set of harms that can arise from the use of NLP however has become a recent concern in the area of trustworthy AI [81,91,158,168,169]. Hovy and Prabhumoye describe five sources of bias in NLP and potential ways to counteract it [170].…”
Section: Processesmentioning
confidence: 99%
“…The set of harms that can arise from the use of NLP however has become a recent concern in the area of trustworthy AI [81,91,158,168,169]. Hovy and Prabhumoye describe five sources of bias in NLP and potential ways to counteract it [170].…”
Section: Processesmentioning
confidence: 99%
“…We reviewed the techniques on identifying and resolving representation bias mostly in tabular data data sets. The existing research has briefly investigated these issues in other data types such as multimedia [13,26,71], text [33,49], graphs, streams [25], spatio-temporal [28], etc. Still, identification and resolving biases in visual data sets has drawn more attention from different research communities and in this section we present a review of the existing works.…”
Section: Expanding the Scope To Other Data Typesmentioning
confidence: 99%
“…8.1.1 Data Augmentation. Many concerns have been posed regarding modern NLP systems having been trained on potentially biased datasets, as as these biases can be perpetuated to downstream tasks and eventually society in the form of allocational harms [Hovy and Prabhumoye 2021]. Therefore, Costa-jussà and de Jorge [2020] claim that developing methods trained on balanced data is a first step to eliminating representational harms.…”
Section: Debiasing Using Data Manipulationmentioning
confidence: 99%