2018
DOI: 10.1002/asi.24048
|View full text |Cite
|
Sign up to set email alerts
|

If these crawls could talk: Studying and documenting web archives provenance

Abstract: The increasing use and prominence of web archives raises the urgency of establishing mechanisms for transparency in the making of web archives to facilitate the process of evaluating a web archive's provenance, scoping, and absences. Some choices and process events are captured automatically, but their interactions are not currently well understood or documented.This study examines the decision space of web archives and its role in shaping what is and what is not captured in the web archiving process. By compa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
10
0
2

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
3
1

Relationship

3
5

Authors

Journals

citations
Cited by 30 publications
(13 citation statements)
references
References 45 publications
1
10
0
2
Order By: Relevance
“…Like other researchers who have studied web archivists and their crawling decisions (Maemura et al, 2018;Ogden et al, 2017), we find that individual human contributions played a role in the spread of this misinformation on platforms like Facebook and Twitter, as well as in its appearance in a number of Wayback collections seeded by Save Page Now. Social media analytics allows us to examine performance trends of URLs in the platform by comparing engagement metrics within the dataset to those of earlier or different versions of the URLS and their derivatives.…”
Section: Methodssupporting
confidence: 72%
“…Like other researchers who have studied web archivists and their crawling decisions (Maemura et al, 2018;Ogden et al, 2017), we find that individual human contributions played a role in the spread of this misinformation on platforms like Facebook and Twitter, as well as in its appearance in a number of Wayback collections seeded by Save Page Now. Social media analytics allows us to examine performance trends of URLs in the platform by comparing engagement metrics within the dataset to those of earlier or different versions of the URLS and their derivatives.…”
Section: Methodssupporting
confidence: 72%
“…Access constraints, as well as current search interfaces make it difficult to see the collections 'in the round' or from a vantage point that gives a sense of where the boundaries of the archive lie. As Maemura et al (2018Maemura et al ( , p.1225) discuss, questions of provenance in web archival collections research 'broadly encompass what users need to know about how a collection was made' in order to be confident in their analysis. Here, auditing is driven by a desire to 'read against the archive' (Zeitlyn, 2012), to assess (in)completeness and contextualise the archive by characterising what can be known about collection practices and their effects on the nature of national web archives.…”
Section: Auditingmentioning
confidence: 99%
“…Whilst Vlassenroot et al connect some of the challenges of researcher use with the very nature of the archival process, little work has empirically and comparatively addressed how researcher engagement is intricately connected to the complex processes of web archival scoping, collection and curation, in practice. Issues associated with researchers' desires to evaluate the provenance of source materials (Maemura et al, 2018) and the complex sociotechnical assemblage of web archival infrastructures -including the diverse institutional access restrictions and protocols (such as those defined by legal deposit), limited tools and interfaces for engaging archived web materials at multiple scales (Lin et al, 2014), and the often obscured or invisible labour of human and automated curatorial interventions (Ben-David & Amram, 2018;Ogden et al, 2017) -all combine to constrain the ways researchers come to know web archives as sources for research.…”
Section: Introductionmentioning
confidence: 99%
“…The final main file type that scholars want are statistical breakdowns of what have been captured by web archives. This is increasingly important as the provenance of web archives is not documented in a standardized way, so it is very useful to know what has been collected (Maemura et al 2018). This can help inform an understanding of the first two derivatives.…”
Section: How Do Scholars Use Web Archives?mentioning
confidence: 99%