Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries 2018
DOI: 10.1145/3197026.3197056
|View full text |Cite
|
Sign up to set email alerts
|

Scraping SERPs for Archival Seeds

Abstract: Event-based collections are often started with a web search, but the search results you find on Day 1 may not be the same as those you find on Day 7. In this paper 1 , we consider collections that originate from extracting URIs (Uniform Resource Identifiers) from Search Engine Result Pages (SERPs). Specifically, we seek to provide insight about the retrievability of URIs of news stories found on Google, and to answer two main questions: first, can one "refind" the same URI of a news story (for the same query) … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0
1

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 28 publications
(26 reference statements)
0
3
0
1
Order By: Relevance
“…Ayrıca tohum URL seçiminde en çok kullanılan yöntemler; manuel seçim [7][8][9], DMOZ ve curlie.org [10,11] gibi açık kaynak dizinlerinden yapılan seçim ve Twitter [12,13] gibi sosyal medyadaki kullanıcıların paylaştıkları URL'ler üzerinden seçimlerdir. Bunlara ek özellikle odaklı tarayıcılarda Google ve Yahoo gibi arama motorları ile yapılan aramalarda, ortaya çıkan URL'leri, tohum URL olarak seçen çalışmalarda mevcuttur [14][15][16][17].…”
Section: Tohum Url Seçi̇mi̇unclassified
“…Ayrıca tohum URL seçiminde en çok kullanılan yöntemler; manuel seçim [7][8][9], DMOZ ve curlie.org [10,11] gibi açık kaynak dizinlerinden yapılan seçim ve Twitter [12,13] gibi sosyal medyadaki kullanıcıların paylaştıkları URL'ler üzerinden seçimlerdir. Bunlara ek özellikle odaklı tarayıcılarda Google ve Yahoo gibi arama motorları ile yapılan aramalarda, ortaya çıkan URL'leri, tohum URL olarak seçen çalışmalarda mevcuttur [14][15][16][17].…”
Section: Tohum Url Seçi̇mi̇unclassified
“…Ogden et al [34] studied web archivists themselves to better understand the ways in which they "shape and maintain the preserved Web." Nwala et al [33] analyzed how to leverage search engine results to populate web archive collections. Nwala et al [32] also used Archive-It collections to compare human-made vs. automatically or semi-automatically generated collections.…”
Section: Related Workmentioning
confidence: 99%
“…Can we build event collections from web archives? Even if resources about events remain on the live Web, Nwala et al (Nwala et al, 2018b) detailed how they become more challenging to discover via search engine results as we get farther from the event. Topical focused crawling provides resources whose terms closely match the terms of a desired topic, such as an event, and these crawlers stop when the matching score for new content is too low.…”
Section: Age and Availability Of Resourcesmentioning
confidence: 99%