Scraping SERPs for Archival Seeds

Nwala, Alexander C.; Weigle, Michele C.; Nelson, Michael L.

doi:10.1145/3197026.3197056

Cited by 10 publications

(4 citation statements)

References 28 publications

(26 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Ayrıca tohum URL seçiminde en çok kullanılan yöntemler; manuel seçim [7][8][9], DMOZ ve curlie.org [10,11] gibi açık kaynak dizinlerinden yapılan seçim ve Twitter [12,13] gibi sosyal medyadaki kullanıcıların paylaştıkları URL'ler üzerinden seçimlerdir. Bunlara ek özellikle odaklı tarayıcılarda Google ve Yahoo gibi arama motorları ile yapılan aramalarda, ortaya çıkan URL'leri, tohum URL olarak seçen çalışmalarda mevcuttur [14][15][16][17].…”

Section: Tohum Url Seçi̇mi̇unclassified

Web Tarayıcılarında Tohum URL Seçimi ve Performans Analizi: Kapsamlı Bir İnceleme

ALANOĞLU

Akçayol

2023

Düzce Üniversitesi Bilim Ve Teknoloji Dergisi

View full text Add to dashboard Cite

Web, İnternet üzerinde yayınlanan çeşitli türden bilgilerin bulunduğu bir veri deposudur. Bu bilgileri üzerinde bulunduran ve birbirlerine köprülerle bağlı olan yapılara web sayfaları denir. Web tarayıcıları, web sayfaları üzerindeki köprüleri kullanarak Web’i tarayan ve sayfaları indiren programlardır. Bir arama motorunun performansı da web tarayıcısının performansına bağlıdır. Web tarayıcılarının performans metrikleri, kapsamı ve tohum URL seçim yöntemleri performansı etkileyen en önemli faktörlerdir. Bu çalışmada, genel, odaklanmış, artırılmış, gizli, mobil ve dağıtılmış olmak üzere altı kategoride sınıflandırdığımız web tarayıcılarının performansları, kapsamları ve tohum URL kullanım yöntemleri hakkında kapsamlı bir inceleme ve analiz yapılmıştır. Ayrıca her bir tarayıcının çeşitli çalışmalarda yapılmış performans ölçütleri karşılaştırılmıştır.

show abstract

Section: Tohum Url Seçi̇mi̇unclassified

Web Tarayıcılarında Tohum URL Seçimi ve Performans Analizi: Kapsamlı Bir İnceleme

ALANOĞLU

Akçayol

2023

Düzce Üniversitesi Bilim Ve Teknoloji Dergisi

View full text Add to dashboard Cite

show abstract

“…Ogden et al [34] studied web archivists themselves to better understand the ways in which they "shape and maintain the preserved Web." Nwala et al [33] analyzed how to leverage search engine results to populate web archive collections. Nwala et al [32] also used Archive-It collections to compare human-made vs. automatically or semi-automatically generated collections.…”

Section: Related Workmentioning

confidence: 99%

Creating Structure in Web Archives with Collections: Different Concepts from Web Archivists

Jones

Klein

et al. 2022

Linking Theory and Practice of Digital Libraries

Self Cite

View full text Add to dashboard Cite

As web archives' holdings grow, archivists subdivide them into collections so they are easier to understand and manage. In this work, we review the collection structures of eight web archive platforms: : Archive-It, Conifer, the Croatian Web Archive (HAW), the Internet Archive's user account web archives, Library of Congress (LC), PANDORA, Trove, and the UK Web Archive (UKWA). We note a plethora of different approaches to web archive collection structures. Some web archive collections support sub-collections and some permit embargoes. Curatorial decisions may be attributed to a single organization or many. Archived web pages are known by many names: mementos, copies, captures, or snapshots. Some platforms restrict a memento to a single collection and others allow mementos to cross collections. Knowledge of collection structures has implications for many different applications and users. Visitors will need to understand how to navigate collections. Future archivists will need to understand what options are available for designing collections. Platform designers need it to know what possibilities exist. The developers of tools that consume collections need to understand collection structures so they can meet the needs of their users. CCS CONCEPTS• Information systems → Web searching and information discovery; Digital libraries and archives.

show abstract

“…Can we build event collections from web archives? Even if resources about events remain on the live Web, Nwala et al (Nwala et al, 2018b) detailed how they become more challenging to discover via search engine results as we get farther from the event. Topical focused crawling provides resources whose terms closely match the terms of a desired topic, such as an event, and these crawlers stop when the matching score for new content is too low.…”

Section: Age and Availability Of Resourcesmentioning

confidence: 99%

The Past Web

2021

View full text Add to dashboard Cite

Table Of Content 0.1 Dedication 0.2 Foreword 0.3 Preface Part 1 The era of information abundance and memory scarcity Chapter 1.0 Part introduction Chapter 1.1 The problem of web ephemera Chapter 1.2 Web archives preserve our digital collective memory Part 2 Collecting before it vanishes Chapter 2.0 Part introduction Chapter 2.1 National web archiving in Australia -representing the comprehensive Chapter 2.2 Web Archiving Singapore: The Realities of National Web Archiving Chapter 2.3 Archiving the social media -the Twitter case Chapter 2.4 Creating Event-Centric Collections from Web ArchivesPart 3 Access methods to analyse the Past web Chapter 3.0 Part introduction Chapter 3.1 Full-text and URL search Chapter 3.2 A holistic view on Web archives Chapter 3.3 Interoperability for Accessing Versions of Web Resources with the Memento Protocol Chapter 3.4 Linking Twitter archives with TV archives Chapter 3.5 Image analytics in web archives Part 4 Researching the past Web Chapter 4.0 Part introduction Chapter 4.1 Digital archaeology in the web of links: reconstructing a late-90s web sphere Chapter 4.2 Quantitative approaches to the Danish web archive Chapter 4.3 Critical Web Archive Research Chapter 4.4 Exploring Online Diasporas: London's French and Latin American Communities in the UK Web Archive Chapter 4.5 Platform and app histories: Assessing source availability in web archives and app repositories Part 5 Web archives as infrastructures to develop innovative tools Chapter 5.0 Part introduction Chapter 5.1 The need for infrastructures for the study of web-archived material Chapter 5.2 Automatic generation of timelines for past events Chapter 5.3 Political opinions of the past Web Chapter 5.4 Framing web archives with browsers contemporary to a website's creation Chapter 5.5 Big Data analytics over past web data Part 6 The Past Web: a look into the future.This book is dedicated to Vitalino Gomes who taught me that the value of a Man is in his Integrity.

show abstract

Scraping SERPs for Archival Seeds

Cited by 10 publications

References 28 publications

Web Tarayıcılarında Tohum URL Seçimi ve Performans Analizi: Kapsamlı Bir İnceleme

Web Tarayıcılarında Tohum URL Seçimi ve Performans Analizi: Kapsamlı Bir İnceleme

Creating Structure in Web Archives with Collections: Different Concepts from Web Archivists

The Past Web

Contact Info

Product

Resources

About