Gender Gap in Natural Language Processing Research: Disparities in Authorship and Citations

Mohammad, Saif M.

doi:10.48550/arxiv.2005.00962

Cited by 2 publications

(3 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As we have discussed, this is not the case. Some studies even find that the increase in the share of women in authorship has led to an increase of gender differences in both productivity and impact (Huang et al, 2020;Mohammad et al, 2020).…”

Section: Article-level Explanationsmentioning

confidence: 99%

“…At the article level, however, there is no clear gap. While some studies of some fields find that women are cited less per article (e.g, Larivière et al, 2013;Mohammad et al, 2020), or are under-cited per reference lists (Håkanson, 2005;Lutz, 1990), more studies find that articles written by women receive comparable, sometimes even higher rates, than articles written by men (e.g., Healy, 2015;Huang et al, 2020;Leahey et al, 2017;Long, 1992).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

The gender citation gap: Approaches, explanations, and implications

2024

Sociology Compass

View full text Add to dashboard Cite

Do women face a disadvantage in terms of citation rates, and if so, in what ways? This article provides a comprehensive overview of existing research on the relationship between gender and citations. Three distinct approaches are identified: (1) per‐article approach that compares gender differences in citations between articles authored by men and women, (2) per‐author approach that compares the aggregate citation records of men and women scholars over a specified period or at the career level, and (3) reference‐ratio approach that assesses the gender distribution of references in articles written by men and women. I show that articles written by women receive comparable or even higher rates of citations than articles written by men. However, women tend to accumulate fewer citations over time and at the career level. Contrary to the notion that women are cited less per article due to gender‐based bias in research evaluation or citing behaviors, this study suggests that the primary reason for the lower citation rates at the author level is women publishing fewer articles over their careers. Understanding and addressing the gender citation gap at the author level should therefore focus on women's lower research productivity over time and the contributing factors. To conclude, I discuss the potential detrimental impact of lower citations on women's career progression and the ways to address the issue to mitigate gender inequalities in science.

show abstract

Section: Article-level Explanationsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

The gender citation gap: Approaches, explanations, and implications

2024

Sociology Compass

View full text Add to dashboard Cite

show abstract

“…Because AND mistakes can cause representational and allocational harm, and no system will ever be perfect, it is critical that any live AND service easily allows authors to correct mistakes made by the system. Other than using self-reported demographic attributes [25], research has focused on using inferred gender from names for studying gender disparities in authorship and citation trends [26,27,28]. Similarly, Bertrand and Mullainathan [29] used inferred gender and race for studying disparities in hiring.…”

Section: Previous Workmentioning

confidence: 99%

S2AND: A Benchmark and Evaluation System for Author Name Disambiguation

Subramanian¹,

King²,

Downey³

et al. 2021

Preprint

View full text Add to dashboard Cite

Author Name Disambiguation (AND) is the task of resolving which author mentions in a bibliographic database refer to the same real-world person, and is a critical ingredient of digital library applications such as search and citation analysis. While many AND algorithms have been proposed, comparing them is difficult because they often employ distinct features and are evaluated on different datasets.In response to this challenge, we present S2AND, a unified benchmark dataset for AND on scholarly papers, as well as an open-source reference model implementation. Our dataset harmonizes eight disparate AND datasets into a uniform format, with a single rich feature set drawn from the Semantic Scholar (S2) database. Our evaluation suite for S2AND reports performance split by facets like publication year and number of papers, allowing researchers to track both global performance and measures of fairness across facet values.Our experiments show that because previous datasets tend to cover idiosyncratic and biased slices of the literature, algorithms trained to perform well on one on them may generalize poorly to others. By contrast, we show how training on a union of datasets in S2AND results in more robust models that perform well even on datasets unseen in training. The resulting AND model also substantially improves over the production algorithm in S2, reducing error by over 50% in terms of B 3 F1. We release our unified dataset, model code, trained models, and evaluation suite to the research community. 1

show abstract

Gender Gap in Natural Language Processing Research: Disparities in Authorship and Citations

Cited by 2 publications

References 17 publications

The gender citation gap: Approaches, explanations, and implications

The gender citation gap: Approaches, explanations, and implications

S2AND: A Benchmark and Evaluation System for Author Name Disambiguation

Contact Info

Product

Resources

About