Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.473
|View full text |Cite
|
Sign up to set email alerts
|

Investigating African-American Vernacular English in Transformer-Based Text Generation

Abstract: The growth of social media has encouraged the written use of African American Vernacular English (AAVE), which has traditionally been used only in oral contexts. However, NLP models have historically been developed using dominant English varieties, such as Standard American English (SAE), due to text corpora availability. We investigate the performance of GPT-2 on AAVE text by creating a dataset of intent-equivalent parallel AAVE/SAE tweet pairs, thereby isolating syntactic structure and AAVE-or SAE-specific l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
41
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 36 publications
(43 citation statements)
references
References 11 publications
2
41
0
Order By: Relevance
“…We infer that the less constrained, opendomain nature of continuation generation tasks makes it more preferable to evaluate mitigation through more flexible comparisons rather than absolute scores. For autocomplete generation, Sheng et al (2019 and Groenwold et al (2020) compare regard or sentiment scores across demographics, Shwartz et al (2020) compare names across various intermediate metrics, Vig et al (2020) measure proportional differences between the amount of bias under a gendered versus ambiguous reading, and Yeo and Chen (2020) compare occupations generated for different genders. Bias studies in dialogue generation use relative scores by comparing sentiment and offensive language discrepancies (Henderson et al, 2018;Liu et al, 2020a,b) and the percentage of gendered words (Dinan et al, 2020a).…”
Section: Evaluation Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…We infer that the less constrained, opendomain nature of continuation generation tasks makes it more preferable to evaluate mitigation through more flexible comparisons rather than absolute scores. For autocomplete generation, Sheng et al (2019 and Groenwold et al (2020) compare regard or sentiment scores across demographics, Shwartz et al (2020) compare names across various intermediate metrics, Vig et al (2020) measure proportional differences between the amount of bias under a gendered versus ambiguous reading, and Yeo and Chen (2020) compare occupations generated for different genders. Bias studies in dialogue generation use relative scores by comparing sentiment and offensive language discrepancies (Henderson et al, 2018;Liu et al, 2020a,b) and the percentage of gendered words (Dinan et al, 2020a).…”
Section: Evaluation Methodsmentioning
confidence: 99%
“…Because of the difficulty in defining metrics, existing works define bias loosely as demographic inequality and use intermediate proxy metrics to comparatively measure bias. Examples include: • Regard Ratio: negative-neutral-positive regard score ratios of text generated from bias-inducing prompts (Sheng et al, 2019) • Sentiment Ratio: negative-neutral-positive sentiment score ratios of text generated from African American English (AAE) versus White-Aligned English (WAE) prompts (Groenwold et al, 2020) • Individual and Group Fairness through Sentiment: comparisons of the sentiment distributions of generated text across demographics and prompts (Huang et al, 2020) • Gendered Word Co-occurrence Score: mean and standard deviations of the absolute log ratio of probabilities: P(word|female terms) to P(word|male terms) across all words in generated text (Bordia and Bowman, 2019) There are also metrics for other bias evaluation setups in continuation generation tasks involving sentiment (Shwartz et al, 2020), the ratio of gendered words (Solaiman et al, 2019;Vig et al, 2020;Dinan et al, 2020a), and other novel metrics Yeo and Chen, 2020). Studies of biases in transformation generation tasks favor metrics of accuracy in terms of successfully transforming text to have a desired property.…”
Section: Bias Definitions and Metricsmentioning
confidence: 99%
See 2 more Smart Citations
“…This information identifies the sublanguage(s) of interest (Grishman and Kittredge, 1986), which determine the availability and development of appropriate NLP tools (Grishman, 2001). Corporate disclosures, financial news reports, and tweets all require different processing strategies (Xing et al, 2018), as do tweets written by different communities (Blodgett et al, 2016;Groenwold et al, 2020). Ex.…”
Section: Translational Nlp Checklistmentioning
confidence: 99%