Investigating African-American Vernacular English in Transformer-Based Text Generation

Groenwold, Sophie; Ou, Lily; Parekh, Aesha; Honnavalli, Samhita; Levy, Sharon; Mirza, Diba; Wang, William Yang

doi:10.18653/v1/2020.emnlp-main.473

Cited by 36 publications

(43 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We infer that the less constrained, opendomain nature of continuation generation tasks makes it more preferable to evaluate mitigation through more flexible comparisons rather than absolute scores. For autocomplete generation, Sheng et al (2019 and Groenwold et al (2020) compare regard or sentiment scores across demographics, Shwartz et al (2020) compare names across various intermediate metrics, Vig et al (2020) measure proportional differences between the amount of bias under a gendered versus ambiguous reading, and Yeo and Chen (2020) compare occupations generated for different genders. Bias studies in dialogue generation use relative scores by comparing sentiment and offensive language discrepancies (Henderson et al, 2018;Liu et al, 2020a,b) and the percentage of gendered words (Dinan et al, 2020a).…”

Section: Evaluation Methodsmentioning

confidence: 99%

“…Because of the difficulty in defining metrics, existing works define bias loosely as demographic inequality and use intermediate proxy metrics to comparatively measure bias. Examples include: • Regard Ratio: negative-neutral-positive regard score ratios of text generated from bias-inducing prompts (Sheng et al, 2019) • Sentiment Ratio: negative-neutral-positive sentiment score ratios of text generated from African American English (AAE) versus White-Aligned English (WAE) prompts (Groenwold et al, 2020) • Individual and Group Fairness through Sentiment: comparisons of the sentiment distributions of generated text across demographics and prompts (Huang et al, 2020) • Gendered Word Co-occurrence Score: mean and standard deviations of the absolute log ratio of probabilities: P(word|female terms) to P(word|male terms) across all words in generated text (Bordia and Bowman, 2019) There are also metrics for other bias evaluation setups in continuation generation tasks involving sentiment (Shwartz et al, 2020), the ratio of gendered words (Solaiman et al, 2019;Vig et al, 2020;Dinan et al, 2020a), and other novel metrics Yeo and Chen, 2020). Studies of biases in transformation generation tasks favor metrics of accuracy in terms of successfully transforming text to have a desired property.…”

Section: Bias Definitions and Metricsmentioning

confidence: 99%

“…Most existing works define bias metrics through the first association-these biases are relatively easier to analyze, since both the demographic and the textual signals of bias are encapsulated within the text. There are also works that define biases towards people who produce the text (Groenwold et al, 2020) or people to whom the text is addressed (Sheng et al, 2021b), though there are relatively fewer works that study these latter associations.…”

Section: Bias Definitions and Metricsmentioning

confidence: 99%

“…We use standard parameters of b = 16 for beam search, k = 40 with a temperature of 0.7 for top-k sampling, and p = 0.95 for nucleus sampling (Holtzman et al, 2019). In terms of bias metrics, we use existing NLG bias metrics: regard ratio (Sheng et al, 2019), sentiment ratio (Groenwold et al, 2020), individual and group fairness through sentiment (IF/GF) (Huang et al, 2020), and a gendered word co-occurrence scores (Bordia and Bowman, 2019). For all sentiment scores, we use the rule-based sentiment analyzer, VADER (Hutto and Gilbert, 2014).…”

Section: A1 Evaluating Biases Across Decoding Techniques and Metricsmentioning

confidence: 99%

See 3 more Smart Citations

Societal Biases in Language Generation: Progress and Challenges

Sheng

Chang

Natarajan

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Technology for language generation has advanced rapidly, spurred by advancements in pre-training large models on massive amounts of data and the need for intelligent agents to communicate in a natural manner. While techniques can effectively generate fluent text, they can also produce undesirable societal biases that can have a disproportionately negative impact on marginalized populations. Language generation presents unique challenges for biases in terms of direct user interaction and the structure of decoding techniques. To better understand these challenges, we present a survey on societal biases in language generation, focusing on how data and techniques contribute to biases and progress towards reducing biases. Motivated by a lack of studies on biases from decoding techniques, we also conduct experiments to quantify the effects of these techniques. By further discussing general trends and open challenges, we call to attention promising directions for research and the importance of fairness and inclusivity considerations for language generation applications.

show abstract

Section: Evaluation Methodsmentioning

confidence: 99%

Section: Bias Definitions and Metricsmentioning

confidence: 99%

Section: Bias Definitions and Metricsmentioning

confidence: 99%

Section: A1 Evaluating Biases Across Decoding Techniques and Metricsmentioning

confidence: 99%

See 2 more Smart Citations

Societal Biases in Language Generation: Progress and Challenges

Sheng

Chang

Natarajan

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

show abstract

“…This information identifies the sublanguage(s) of interest (Grishman and Kittredge, 1986), which determine the availability and development of appropriate NLP tools (Grishman, 2001). Corporate disclosures, financial news reports, and tweets all require different processing strategies (Xing et al, 2018), as do tweets written by different communities (Blodgett et al, 2016;Groenwold et al, 2020). Ex.…”

Section: Translational Nlp Checklistmentioning

confidence: 99%

Translational NLP: A New Paradigm and General Principles for Natural Language Processing Research

Newman-Griffis¹,

Lehman

Rosé³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Natural language processing (NLP) research combines the study of universal principles, through basic science, with applied science targeting specific use cases and settings. However, the process of exchange between basic NLP and applications is often assumed to emerge naturally, resulting in many innovations going unapplied and many important questions left unstudied. We describe a new paradigm of Translational NLP, which aims to structure and facilitate the processes by which basic and applied NLP research inform one another. Translational NLP thus presents a third research paradigm, focused on understanding the challenges posed by application needs and how these challenges can drive innovation in basic science and technology design. We show that many significant advances in NLP research have emerged from the intersection of basic principles with application needs, and present a conceptual framework outlining the stakeholders and key questions in translational research. Our framework provides a roadmap for developing Translational NLP as a dedicated research area, and identifies general translational principles to facilitate exchange between basic and applied research.

show abstract

MarIA and BETO are sexist: evaluating gender bias in large language models for Spanish

Garrido-Muñoz

Martínez-Santiago

Montejo-Ráez

2023

Lang Resources & Evaluation

View full text Add to dashboard Cite

The study of bias in language models is a growing area of work, however, both research and resources are focused on English. In this paper, we make a first approach focusing on gender bias in some freely available Spanish language models trained using popular deep neural networks, like BERT or RoBERTa. Some of these models are known for achieving state-of-the-art results on downstream tasks. These promising results have promoted such models’ integration in many real-world applications and production environments, which could be detrimental to people affected for those systems. This work proposes an evaluation framework to identify gender bias in masked language models, with explainability in mind to ease the interpretation of the evaluation results. We have evaluated 20 different models for Spanish, including some of the most popular pretrained ones in the research community. Our findings state that varying levels of gender bias are present across these models.This approach compares the adjectives proposed by the model for a set of templates. We classify the given adjectives into understandable categories and compute two new metrics from model predictions, one based on the internal state (probability) and the other one on the external state (rank). Those metrics are used to reveal biased models according to the given categories and quantify the degree of bias of the models under study.

show abstract

Investigating African-American Vernacular English in Transformer-Based Text Generation

Cited by 36 publications

References 11 publications

Societal Biases in Language Generation: Progress and Challenges

Societal Biases in Language Generation: Progress and Challenges

Translational NLP: A New Paradigm and General Principles for Natural Language Processing Research

MarIA and BETO are sexist: evaluating gender bias in large language models for Spanish

Contact Info

Product

Resources

About