A large literature addresses the processes, circumstances and motivations that have given rise to archives. These questions are increasingly being asked of digital archives, too. Here, we examine the complex interplay of institutional, intellectual, economic, technical, practical and social factors that have shaped decisions about the inclusion and exclusion of digitised newspapers in and from online archives. We do so by undertaking and analysing a series of semi-structured interviews conducted with public and private providers of major newspaper digitisation programmes. Our findings contribute to emerging understandings of factors that are rarely foregrounded or highlighted, yet fundamentally shape the depth and scope of digital cultural heritage archives and thus the questions that can be asked of them, now and in the future. Moreover, we draw attention to providers' emphasis on meeting the needs of their end-users and how this is shaping the form and function of digital archives. The end user is not often emphasised in the wider literature on archival studies and we thus draw attention to the potential merit of this vector in future studies of digital archives.
This dataset, part of the Scissors and Paste Project (https://osf.io/nm2rq), describes instances of reprinting and text reuse (scissors-and-paste journalism) in British newspapers between 1800-1837. It was derived from the 19 th -Century British Library Newspapers, Part 1 digitised newspaper collection by using plagiarism detection software to identify instances of substantially similar text. It contains a series of manifests that describe a) instances of shared content b) the likely directionality of copying and c) which instances are evolutionary dead-ends and have no known reprints. It is comprised of 1,824 TSV files, divided into four directories, each representing one month between January 1800 and December 1837.
Newspaper digitisation has been hailed as a revolutionary change in how researchers can engage with the periodical press. 1 From immediate global access, to keyword searching, to large-scale text and image analysis, the ever-growing availability of electronic facsimiles, metadata, and machine-readable transcriptions has encouraged scholars to pursue large-scale analyses rather than rely on samplings and soundings from an unwieldy and fragmentary record-to go beyond the case study and attempt the "comprehensive history" of the press that seemed so elusive forty years ago. 2 Yet, after a decade of access to digital newspaper corpora, much of what has been attempted remains fundamentally conservative in approach. 3 In British Settler Emigration in Print (2016), Jude Piesse laudably provides URLs to the precise facsimiles she consulted and comments on the search parameters used to obtain her sample. However, her coverage was fragmentary, relying heavily upon select case studies rather than demonstrating general trends, admitting that "[d]igital searches frequently generate thousands of hits, which can be difficult to navigate or to appraise in any detail." 4 She also subtly laments the loss of the immersive offline experience: "Despite the obvious benefits of focused digital searching, it is quite possible that it misses details that research in paper archives would bring to light," the ease of jumping straight to a keyword discouraging a deep contextual understanding of the materials. Online interfaces encourage this type of sampling, with simplified full-text and metadata searches returning a list of "relevant" hits based on often-hidden algorithms, constricting research in ways similar to using a publishercreated newspaper index.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.