Large media platforms are now in the habit of providing facts in their products and representing knowledge to various publics. For example, Google’s Knowledge Graph is a database of facts that Google uses to provide quick answers to publics who use their products, while Wikipedia has a product called Wikidata that similarly stores facts about the world in data formats through which various apps can retrieve the data. Microsoft, Amazon, and IBM use similar fact storing and retrieval techniques in their products. This panel introduces papers that take a political economy perspective on such platformaized versions of fact production and examines the underlying infrastructures, histories, and modeling techniques used in such knowledge representation systems.
Knowledge representation, long a central topic in archiving work in library and information sciences, is a key feature of platforms and practiced by internet companies more broadly. Much of this work has historically centered on metadata models that seek to organize and describe information in standardized ways. In the context of expanding this data organizing and labeling work into the wider web, one of the main facilitators was the “Semantic Web” project proposed by Tim-Berners Lee and the World Wide Web Consortium (W3C). Today, many of the same principles, technologies, and standards that were proposed by those early projects in metadata modeling from groups like W3C are found at companies like Google and Facebook, organizations like Wikipedia, government portals, and beyond.
These platform metadata models are typically produced by industry professionals (e.g., taxonomists, ontologists, knowledge engineers, etc.) who help structure information for algorithmic processing on platforms and their recommender systems. Such structured information is supposed to add a layer of contextual expressivity to web data that would otherwise be more difficult to parse, though the issue of context control is not unproblematic in relation to statements of facts. In many of these automated systems, metadata models contribute to articulating ready-made facts that then travel through these systems and eventually reach the products that are engaged by everyday web users. This panel connects scholars working in information, media studies, and science and technology studies to discuss these semantic technologies.
The first paper presents data gathered from interviews with semantic web practitioners who build or have built metadata models at large internet and platform companies. It presents results from a qualitative study of these platform data management professionals (collectively referred to as “metadata modelers”) and draws from unstructured interviews (n=10) and archival research. The paper describes the image of a metadata ecology along with selected work-related contestations expressed by interview subjects regarding some of the difficulties and intractable problems in metadata modeling work. The paper includes a discussion of the political economy of platform semantics through an examination of critical semantic web literature and ends with some policy concerns.
The second paper translates the method of tracing “traveling facts” from science studies to the context of online knowledge about evolving, historic events. The goal is to understand the socio-political impact of the semantic web as it has been implemented by monopolistic digital platforms and how such practices intersect in the context of Wikipedia, where the majority of knowledge graph entities are sourced from. The paper describes how the adoption (and domination) by platform companies of linked data has catalyzed a re-shaping of web content to accord with the question and answer linked data formats, weakening the power of open content licenses to support local knowledge and consolidating the power of algorithmic knowledge systems that favor knowledge monopolies.
The third paper discusses building a semantic foundation for machine learning and examines how information infrastructures that convey meaning are intimately tied to colonial labor relations. It traces the practice of building a digital infrastructure that enables machines to learn from human language. The paper describes examples from an ethnographic study of semantic computing and its infrastructuring practices to show how such techniques are materially and discursively performative in their co-emergence with techno-epistemic discourses and politico-economic structures. It examines sociomaterial process in which classifications, standards, metadata, and methods co-emerge with processes of signification that reconstitute and/or shift hegemonic ecologies of knowledge.
The fourth paper evaluates and examines the ethics of “free” data (CC-0) in Wikidata by evaluating the sources and usage of data from and within Wikidata. From knowledge graphs to AI training, Wikidata is the semantic web platform that is being used across the Internet to power new platforms. Through a consideration of the ways in which Wikidata scrapes Wikipedia’s “share alike” knowledge through scraping metadata and the significant donations and partnerships from large technology firms (Google in particular), this paper addresses ethical concerns within the largest semantic web platform, how these transformations of knowledge alienate donated volunteer labor, and offers some ways in which these issues might be mitigated.