This paper argues that analyses of the ways in which Big Data has been enacted in other academic disciplines can provide us with concepts that will help understand the application of Big Data to social questions. We use examples drawn from our Science and Technology Studies (STS) analyses of-omic biology and high energy physics to demonstrate the utility of three theoretical concepts: (i) primary and secondary inscriptions, (ii) crafted and found data, and (iii) the locus of legitimate interpretation. These help us to show how the histories, organisational forms, and power dynamics of a field lead to different enactments of big data. The paper suggests that these concepts can be used to help us to understand the ways in which Big Data is being enacted in the domain of the social sciences, and to outline in general terms the ways in which this enactment might be different to that which we have observed in the 'hard' sciences. We contend that the locus of legitimate interpretation of Big Data biology and physics is tightly delineated, found within the disciplinary institutions and cultures of these disciplines. We suggest that when using Big Data to make knowledge claims about 'the social' the locus of legitimate interpretation is more diffuse, with knowledge claims that are treated as being credible made from other disciplines, or even by those outside academia entirely.
We analyse a recent paper by Goddiksen (2014) where the author raises questions about the relationship between authorship, attribution and Collins & Evans' concept of contributory and interactional expertise. We then highlight recent empirical work in the sociology of climate change science that has made similar points in order to clarify how authorship, division of labour and contribution are handled in real scientific settings. Despite this, Goddiksen's critique of both contributory and interactional expertise is ultimately ineffective because it rests on a misguided attempt to de-socialise these concepts. We conclude by stressing the importance of collective tacit knowledge acquisition through immersion as a critical step in becoming a full-blown contributory or interactional expert.
This paper describes the intense software filtering that has allowed the arXiv e-print repository to sort and process large numbers of submissions with minimal human intervention, making it one of the most important and influential cases of open access repositories to date. The paper narrates arXiv's transformation, using sophisticated sorting/filtering algorithms to decrease human workload, from a small mailing list used by a few hundred researchers to a site that processes thousands of papers per month. However there are significant negative consequences for authors who have been filtered out of arXiv's main categories. There is thus a continued need to check and balance arXiv's boundaries, based in the essential tension between stability and innovation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.