2019
DOI: 10.48550/arxiv.1905.03836
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Collecting 16K archived web pages from 17 public web archives

Abstract: We document the creation of a data set of 16,627 archived web pages, or mementos, of 3,698 unique live web URIs (Uniform Resource Identifiers) from 17 public web archives. We used four different methods to collect the dataset. First, we used the Los Alamos National Laboratory (LANL) Memento Aggregator to collect mementos of an initial set of URIs obtained from four sources: (a) the Moz Top 500, (b) the dataset used in our previous study, (c) the HTTP Archive, and (d) the Web Archives for Historical Research gr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 7 publications
0
3
0
Order By: Relevance
“…Our original study accessed 16,627 mementos from 17 public web archives 39 times over a period of 14 months (Nov 2017 -Jan 2019); the details of data selection are described elsewhere [10]. For each URI-R chosen, we used the LANL Memento Aggregator [13] in November 2017 to discover URI-Ms in different web archives.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Our original study accessed 16,627 mementos from 17 public web archives 39 times over a period of 14 months (Nov 2017 -Jan 2019); the details of data selection are described elsewhere [10]. For each URI-R chosen, we used the LANL Memento Aggregator [13] in November 2017 to discover URI-Ms in different web archives.…”
Section: Methodsmentioning
confidence: 99%
“…We wanted to study the fixity of archived web pages, so we gathered a diverse set of mementos from 17 web archives distributed over 1996-2017. Our longitudinal experiment involved replaying the same mementos over the course of 14 months [7,8,9,10]. During our study, we noticed that we were no longer able to access any mementos from four web archives (Library and Archives Canada, the National Library of Ireland, the Public Record Office of Northern Ireland, and Perma.cc) at certain points, and there was no machine-readable redirection to the new URIs.…”
Section: Introductionmentioning
confidence: 98%
“…In November 2017, we collected a dataset of 16,627 mementos of 3,698 unique URI-Rs (original resources) from the 17 public web archives shown in Table 1. Our technical report [98] describes in detail the methods we used to create this dataset. We provide a summary of the process here.…”
Section: Step 1: Collect a Dataset Of Mementosmentioning
confidence: 99%