A subset of machine learning research intersects with societal issues, including fairness, accountability and transparency, as well as the use of machine learning for social good. In this work, we analyze the scholars contributing to this research at the intersection of machine learning and society through the lens of the sociology of science. By analyzing the authorship of all machine learning papers posted to arXiv, we show that compared to researchers from overrepresented backgrounds (defined by gender and race/ethnicity), researchers from underrepresented backgrounds are more likely to conduct research at this intersection than other kinds of machine learning research. This state of affairs leads to contention between two perspectives on insiders and outsiders in the scientific enterprise: outsiders being those outside the group being studied, and outsiders being those who have not participated as researchers in an area historically. This contention manifests as an epistemic question on the validity of knowledge derived from lived experience in machine learning research, and predicts boundary work that we see in a real-world example.
IntroductionResearch on the theory and methods of machine learning has led to the ability of technological systems to grow by leaps and bounds in the last decade. With this increasing competence, machine learning is increasingly being employed in real-world sociotechnical contexts of high consequence. People and machines are now truly starting to become partners in various aspects of life, livelihood, and liberty.This intersection of machine learning with society has fueled a small segment of research effort devoted to it. Two such efforts include research on (1) fairness, accountability and transparency of machine learning (FAccT), and (2) artificial intelligence (AI) for social good. The first of these focuses on the imperative 'do no harm' or nonmaleficence, with a special focus on preventing harms to marginalized people and groups caused or exacerbated by the use of machine learning in representation and decision making. The second focuses on using machine learning technologies as an instrument of beneficence to uplift vulnerable people and groups out of poverty, hunger, ill health, and other societal inequities.