Scholars and practitioners across domains are increasingly concerned with algorithmic transparency and opacity, interrogating the values and assumptions embedded in automated, black-boxed systems, particularly in user-generated content platforms. I report from an ethnography of infrastructure in Wikipedia to discuss an often understudied aspect of this topic: the local, contextual, learned expertise involved in participating in a highly automated social–technical environment. Today, the organizational culture of Wikipedia is deeply intertwined with various data-driven algorithmic systems, which Wikipedians rely on to help manage and govern the “anyone can edit” encyclopedia at a massive scale. These bots, scripts, tools, plugins, and dashboards make Wikipedia more efficient for those who know how to work with them, but like all organizational culture, newcomers must learn them if they want to fully participate. I illustrate how cultural and organizational expertise is enacted around algorithmic agents by discussing two autoethnographic vignettes, which relate my personal experience as a veteran in Wikipedia. I present thick descriptions of how governance and gatekeeping practices are articulated through and in alignment with these automated infrastructures. Over the past 15 years, Wikipedian veterans and administrators have made specific decisions to support administrative and editorial workflows with automation in particular ways and not others. I use these cases of Wikipedia’s bot-supported bureaucracy to discuss several issues in the fields of critical algorithms studies; critical data studies; and fairness, accountability, and transparency in machine learning—most principally arguing that scholarship and practice must go beyond trying to “open up the black box” of such systems and also examine sociocultural processes like newcomer socialization.
This report is a high-level summary analysis of the 2017 GitHub Open Source Survey dataset, presenting frequency counts, proportions, and frequency or proportion bar plots for every question asked in the survey. This report was generated from a Jupyter notebook that can be found on OSF at http://doi.org/10.17605/OSF.IO/ENRQ5.
What are the challenges and best practices for doing data-intensive research in teams, labs, and other groups? This paper reports from a discussion in which researchers from many different disciplines and departments shared their experiences on doing data science in their domains. The issues we discuss range from the technical to the social, including issues with getting on the same computational stack, workflow and pipeline management, handoffs, composing a well-balanced team, dealing with fluid membership, fostering coordination and communication, and not abandoning best practices when deadlines loom. We conclude by reflecting about the extent to which there are universal best practices for all teams, as well as how these kinds of informal discussions around the challenges of doing research can help combat impostor syndrome.
Turnover is a fact of life for any project, and academic research teams can face particularly high levels of people who come and go through the duration of a project.In this article, we discuss the challenges of turnover and some potential practices for helping manage it, particularly for computational-and data-intensive research teams and projects. The topics we discuss include establishing and implementing data management plans, file and format standardization, workflow and process documentation, clear team roles, and check-in and check-out procedures.
What actions can we take to foster diverse and inclusive workplaces in the broad fields around data science? This paper reports from a discussion in which researchers from many different disciplines and departments raised questions and shared their experiences with various aspects around diversity, inclusion, and equity. The issues we discuss include fostering inclusive interpersonal and small group dynamics, rules and codes of conduct, increasing diversity in less-representative groups and disciplines, organizing events for diversity and inclusion, and long-term efforts to champion change.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.