Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries 2007
DOI: 10.1145/1255175.1255182
|View full text |Cite
|
Sign up to set email alerts
|

Factors affecting website reconstruction from the web infrastructure

Abstract: When a website is suddenly lost without a backup, it may be reconstituted by probing web archives and search engine caches for missing content. In this paper we describe an experiment where we crawled and reconstructed 300 randomly selected websites on a weekly basis for 14 weeks. The reconstructions were performed using our web-repository crawler named Warrick which recovers missing resources from the Web Infrastructure (WI), the collective preservation effort of web archives and search engine caches. We exam… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
1

Year Published

2007
2007
2015
2015

Publication Types

Select...
3
3
1

Relationship

3
4

Authors

Journals

citations
Cited by 13 publications
(17 citation statements)
references
References 26 publications
0
16
1
Order By: Relevance
“…Directory page. Although, some solutions based on searching for similar content on the Web have been already proposed [6], [14], the problem of locating moved pages or their content parts is still unresolved.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…Directory page. Although, some solutions based on searching for similar content on the Web have been already proposed [6], [14], the problem of locating moved pages or their content parts is still unresolved.…”
Section: Discussionmentioning
confidence: 99%
“…McCown et al [14] measured the availability of page copies inside the repositories of major search engines and the Internet Archive. Their research was motivated by the need to provide efficient methods for reproducing the latest versions of Web sites.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…For example, Google's cache provided 95% of the resources for recovering website 2 but only 22% for website 1. More extensive experiments reconstructing 300 randomly selected websites over a period of three months have shown that on average 61% of a website's resources (77% textual, 42% images and 32% other) could be recovered if the website were lost and immediately reconstructed [7].…”
Section: Background On Lazy Preservationmentioning
confidence: 99%