Bots, Seeds and People

Summers, Ed; Punzalan, Ricardo L.

doi:10.1145/2998181.2998345

Cited by 26 publications

(9 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Further, by aligning with approaches like critical data studies, future work in web archives provenance aim for a deeper understanding of the sociotechnical system involved in production, the limitations and constraints imposed, and the workarounds and invisible labor required to sustain web archives systems. These concerns are central to the recent work of Summers and Punzalan () and Ogden et al ().…”

Section: The Need For New Perspectives On Web Archives Provenancementioning

confidence: 97%

“…Emerging research focuses on the situated practice of web archiving and the activities that are involved in web crawling and constructing a collection. Summers and Punzalan () explore the interactions between the individuals creating web archives and the systems or automated agents used. Their work highlights that the process of selection and scoping is collaborative work between human and machine actors and requires a sociotechnical perspective.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

If these crawls could talk: Studying and documenting web archives provenance

Maemura

Worby

Milligan

et al. 2018

Asso for Info Science & Tech

View full text Add to dashboard Cite

The increasing use and prominence of web archives raises the urgency of establishing mechanisms for transparency in the making of web archives to facilitate the process of evaluating a web archive's provenance, scoping, and absences. Some choices and process events are captured automatically, but their interactions are not currently well understood or documented.This study examines the decision space of web archives and its role in shaping what is and what is not captured in the web archiving process. By comparing how three different web archives collections were created and documented, we investigate how curatorial decisions interact with technical and external factors and we compare commonalities and differences.The findings reveal the need to understand both the social and technical context that shapes those decisions and the ways in which these individual decisions interact. Based on the study, we propose a framework for documenting key dimensions of a collection that addresses the situated nature of the organizational context, technical specificities, and unique characteristics of web materials that are the focus of a collection. The framework enables future researchers to undertake empirical work studying the process of creating web archives collections in different contexts.

show abstract

Section: The Need For New Perspectives On Web Archives Provenancementioning

confidence: 97%

Section: Introductionmentioning

confidence: 99%

If these crawls could talk: Studying and documenting web archives provenance

Maemura

Worby

Milligan

et al. 2018

Asso for Info Science & Tech

View full text Add to dashboard Cite

show abstract

“…This research is therefore motivated by the observation that despite their positioning as critical resources for a range of scholarly Internet research agendas (Rogers, 2013) and their widespread use as tools for evidence-based accountability online, WAs remain relatively understudied. As such, recent scholarship has framed the need for further research into the practices of web archiving, arguing the inherent connections between the ways the Web is archived and our future understanding of the Web's past (Ogden et al, 2017;Summers and Punzalan, 2017).…”

Section: Background and Methodsmentioning

confidence: 99%

Web Archiving as Culture: Tumblr and the Cultural Construction of the Archived Web

Ogden

2020

SPIR

View full text Add to dashboard Cite

Web archives - broadly conceived as any attempt to capture and preserve the Web for future use - are evermore central to discussions of digital access in the public sphere, as they provide tools for accessing parts of the Web that have been subject to neglect, removal or state and platform-based forms of content moderation and censorship. In this paper I discuss the cultural significance of web archiving through the example of Tumblr’s 2018 efforts to remove so-called ‘Not Safe for Work’ (NSFW) posts from the platform. The paper examines the archiving of Tumblr NSFW by Archive Team, a self-described ‘loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage’. Findings are presented through the concept of culture which provides a dual lens through which to understand web archiving practices as contingent upon the cultural worlds which they create and operate within. Here, web archiving as culture reveals the ways that practices shape (and are shaped by) online community membership, the nature of how and why the Web is archived and the reflexive significance participants place on their own web archival activities. The paper contributes to broader discussions of online community formation and raises further questions about the ethics and role of power in the production of web archives, as well as their positioning as historical representations of online cultures.

show abstract

“…Data-centered analysis requires significant technical capacity to support web archives' large scale of data, but also requires an understanding of legal and ethical constraints, and tacit organizational practices that shape collection and use of archived web data. As a result, web archives are increasingly understood as sociotechnical infrastructures, aligning with the dimensions outlined by Bowker and Star (2000): Summers and Punzalan (2017) address the role of breakdown and repair in collecting practices; Ben-David and Amram (2018) consider how to reveal the "black boxed" processes of collecting; Ogden (2021) identifies how archiving decisions are influenced by organizational norms and membership; and, Hegarty (2022) analyzes the metaphor of "publication" for web materials relating to institutional archives built on the "installed base" of libraries.…”

Section: Background: Studying Web Archives As Datamentioning

confidence: 99%

All WARC and no playback: The materialities of data-centered web archives research

Maemura

2023

Big Data & Society

View full text Add to dashboard Cite

This paper examines the Web ARChive (WARC) file format, revealing how the format has come to play a central role in the development and standardization of interoperable tools and methods for the international web archiving community. In the context of emerging big data approaches, I consider the sociotechnical relationships between material construction of data and information infrastructures for collecting and research. Analysis is inspired by Star and Griesemer's historical case of the Museum of Vertebrate Zoology which reveals how boundary objects and methods standardization are used to enroll actors in the work of collecting for natural history. I extend these concepts by pairing them with frameworks for studying digital materiality and the representational qualities of data artifacts. Through examples drawn from fieldwork observations studying two data-centered research projects, I consider how the materiality of the WARC format influences research methods and approaches to data extraction, selection, and transformation. Findings identify three modalities researchers use to configure WARC data for researcher needs: using indexes to support search queries, constructing derivative formats designed for certain types of analysis, and generating custom-designed datasets tailored for specific research purposes. Findings additionally reveal similarities in how these distinct methods approach automated data extraction by relying upon the WARC's standardized metadata elements. By interrogating whose information needs are being met and taken into account in the design of the WARC's underlying information representation, I reveal effects on the emerging field of web history, and consider alternative approaches to knowledge production with archived web data.

show abstract

Bots, Seeds and People

Cited by 26 publications

References 49 publications

If these crawls could talk: Studying and documenting web archives provenance

If these crawls could talk: Studying and documenting web archives provenance

Web Archiving as Culture: Tumblr and the Cultural Construction of the Archived Web

All WARC and no playback: The materialities of data-centered web archives research

Contact Info

Product

Resources

About