2021
DOI: 10.1007/s42803-020-00029-6
|View full text |Cite
|
Sign up to set email alerts
|

From archive to analysis: accessing web archives at scale through a cloud-based interface

Abstract: This paper introduces the Archives Unleashed Cloud, a web-based interface for working with web archives at scale. Current access paradigms, largely driven by the scope and scale of web archives, generally involve using the command line and writing code. This access gap means that subject-matter experts, as opposed to developers and programmers, have few options to directly work with web archives beyond the page-by-page paradigm of the Wayback Machine. Drawing on first-hand research and analysis of how scholars… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 18 publications
0
5
0
Order By: Relevance
“…The reasons for these differences need further investigation and research. As stated by Ruest et al (2021), who conducted research specifically on the web archiving tool named Heritrix, advanced knowledge of how its user needs the processing chain works. In addition, comprehension of the method to debug odd web behaviour during crawling process encounters by Heritrix is also important.…”
Section: Discussionmentioning
confidence: 99%
“…The reasons for these differences need further investigation and research. As stated by Ruest et al (2021), who conducted research specifically on the web archiving tool named Heritrix, advanced knowledge of how its user needs the processing chain works. In addition, comprehension of the method to debug odd web behaviour during crawling process encounters by Heritrix is also important.…”
Section: Discussionmentioning
confidence: 99%
“…While Archive-It collections can be accessed openly on the web using an interface similar to the Internet Archive Wayback Machine, AU provides a data analysis toolset compatible with Archive-It collections. Specifically, the Archives Unleashed Cloud service was in development at the time of my study, tailored for Archive-It users and organizations (Ruest et al, 2021). In November 2018, I acted as a participant-observer engaging with three teams during the two-day “datathon” event hosted in Vancouver.…”
Section: Methodsmentioning
confidence: 99%
“…To this end, the project aspires to make petabytes of historical internet content accessible to scholars and others interested in researching the recent past. Between 2017 and 2020, the project focused on developing the "Archives Unleashed Cloud, " a web-based interface for working with web archives at scale using the Archives Unleashed Toolkit and Apache Spark [12]. This work built on the project's long-standing interests in building exploratory search interfaces for web archive collections [8].…”
Section: Related Work and Project Contextmentioning
confidence: 99%
“…2 The Archives Unleashed project aims to address this problem [13] by being for web archive analysis as Archive-It is for web archive capture: powerful, scalable, and above all, accessible and intuitive for users. The Archives Unleashed Cloud (2017-2020) provided user access to the features of the Archives Unleashed Toolkit in a cloud-hosted environment [12]. The Cloud worked with Archive-It collections, using APIs to transfer data from the Internet Archive to Compute Canada cloud-hosted infrastructure.…”
Section: Related Work and Project Contextmentioning
confidence: 99%
See 1 more Smart Citation