The US scientific workforce is primarily composed of White men. Studies have demonstrated the systemic barriers preventing women and other minoritized populations from gaining entry to science; few, however, have taken an intersectional perspective and examined the consequences of these inequalities on scientific knowledge. We provide a large-scale bibliometric analysis of the relationship between intersectional identities, topics, and scientific impact. We find homophily between identities and topic, suggesting a relationship between diversity in the scientific workforce and expansion of the knowledge base. However, topic selection comes at a cost to minoritized individuals for whom we observe both between- and within-topic citation disadvantages. To enhance the robustness of science, research organizations should provide adequate resources to historically underfunded research areas while simultaneously providing access for minoritized individuals into high-prestige networks and topics.
Racial disparity in academia is a widely acknowledged problem. The quantitative understanding of racial-based systemic inequalities is an important step towards a more equitable research system. However, because of the lack of robust information on authors’ race, few large-scale analyses have been performed on this topic. Algorithmic approaches offer one solution, using known information about authors, such as their names, to infer their perceived race. As with any other algorithm, the process of racial inference can generate biases if it is not carefully considered. The goal of this article is to assess the extent to which algorithmic bias is introduced using different approaches for name-based racial inference. We use information from the U.S. Census and mortgage applications to infer the race of U.S. affiliated authors in the Web of Science. We estimate the effects of using given and family names, thresholds or continuous distributions, and imputation. Our results demonstrate that the validity of name-based inference varies by race/ethnicity and that threshold approaches underestimate Black authors and overestimate White authors. We conclude with recommendations to avoid potential biases. This article lays the foundation for more systematic and less-biased investigations into racial disparities in science.
International trade is one of the classic areas of study in economics. Its empirical analysis is a complex problem, given the amount of products, countries and years. Nowadays, given the availability of data, the tools used for the analysis can be complemented and enriched with new methodologies and techniques that go beyond the traditional approach. This new possibility opens a research gap, as new, data-driven, ways of understanding international trade, can help our understanding of the underlying phenomena. The present paper shows the application of the Latent Dirichlet allocation model, a well known technique in the area of Natural Language Processing, to search for latent dimensions in the product space of international trade, and their distribution across countries over time. We apply this technique to a dataset of countries’ exports of goods from 1962 to 2016. The results show that this technique can encode the main specialisation patterns of international trade. On the country-level analysis, the findings show the changes in the specialisation patterns of countries over time. As traditional international trade analysis demands expert knowledge on a multiplicity of indicators, the possibility of encoding multiple known phenomena under a unique indicator is a powerful complement for traditional tools, as it allows top-down data-driven studies.
Over the last century, we observe a steady and exponential growth of scientific publications globally. The overwhelming amount of available literature makes a holistic analysis of the research within a field and between fields based on manual inspection impossible. Automatic techniques to support the process of literature review are required to find the epistemic and social patterns that are embedded in scientific publications. In computer sciences, new tools have been developed to deal with large volumes of data. In particular, deep learning techniques open the possibility of automated end-to-end models to project observations to a new, low-dimensional space where the most relevant information of each observation is highlighted. Using deep learning to build new representations of scientific publications is a growing but still emerging field of research. The aim of this paper is to discuss the potential and limits of deep learning for gathering insights about scientific research articles. We focus on document-level embeddings based on the semantic and relational aspects of articles, using Natural Language Processing (NLP) and Graph Neural Networks (GNNs). We explore the different outcomes generated by those techniques. Our results show that using NLP we can encode a semantic space of articles, while GNN we enable us to build a relational space where the social practices of a research community are also encoded.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.