Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion 2022
DOI: 10.18653/v1/2022.ltedi-1.9
|View full text |Cite
|
Sign up to set email alerts
|

Disambiguation of morpho-syntactic features of African American English – the case of habitual be

Abstract: Recent research has highlighted that natural language processing (NLP) systems exhibit a bias against African American speakers. The bias errors are often caused by poor representation of linguistic features unique to African American English (AAE), due to the relatively low probability of occurrence of many such features in training data. We present a workflow to overcome such bias in the case of habitual "be". Habitual "be" is isomorphic, and therefore ambiguous, with other forms of "be" found in both AAE an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
10
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(11 citation statements)
references
References 2 publications
1
10
0
Order By: Relevance
“…Recently, there has been a surge in NLP research for AAE. Studies have explored dependency parsing (Blodgett et al, 2018), POS-tagging (Dacon, 2022;Jørgensen et al, 2016), hate speech classification (Harris et al, 2022;Sap et al, 2019), automatic speech recognition (Koenecke et al, 2020;Martin and Tang, 2020), dialectal analysis (Blodgett et al, 2016;Dacon, 2022;Stewart, 2014) and feature detection (Masis et al, 2022;Santiago et al, 2022). Projects such as these rely heavily on large amounts of labeled data, however, little research is dedicated to optimizing the disambiguation and annotation process.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Recently, there has been a surge in NLP research for AAE. Studies have explored dependency parsing (Blodgett et al, 2018), POS-tagging (Dacon, 2022;Jørgensen et al, 2016), hate speech classification (Harris et al, 2022;Sap et al, 2019), automatic speech recognition (Koenecke et al, 2020;Martin and Tang, 2020), dialectal analysis (Blodgett et al, 2016;Dacon, 2022;Stewart, 2014) and feature detection (Masis et al, 2022;Santiago et al, 2022). Projects such as these rely heavily on large amounts of labeled data, however, little research is dedicated to optimizing the disambiguation and annotation process.…”
Section: Related Workmentioning
confidence: 99%
“…In their work, Santiago et al (2022) leveraged linguistic descriptions and POS tagging to develop a rule-based method for identifying many nonhabitual instances of "be". This allowed us to filter out these instances, thereby balancing our dataset by increasing the frequency of habitual be examples.…”
Section: Pos Patternsmentioning
confidence: 99%
See 3 more Smart Citations