This essay addresses two open challenge in the domain of digital scholarly editing: (1) formally defining the meaning of markup, and (2) allowing the reuse and exchange of textual data through a distributed editorial workflow that allows the editing of texts from multiple, diverging yet co-existing perspectives. We argue that successfully addressing these issues would promote the distribution and exchange of scholarly knowledge, on a technical as well as a theoretical level. The essay introduces ongoing work on a new data model for text called ‘TAG’ (Text-as-Graph) and its reference implementation ‘Alexandria’. The essay outlines how TAG, based on a hypergraph for text, can improve the modeling of complex literary texts, and how Alexandria supports the exchange of markup files in a way that sustains scholarly discourse. We discuss three components of TAG: first, the markup technology stack allows for the formal definition of the meaning of markup (‘markup semantics’); secondly, users can add multiple layers of markup that each represent an alternative perspective on text; and finally the editorial workflow is set up in a git-like distributed version management system. As a result, the TAG model provides for the synthesis of dispersed scholarly practices and the advancement of academic discourse.
The article discusses how micro-level textual variation can be expressed in an idiomatic manner using markup, and how the markup information is subsequently used by a digital collation tool for a more refined analysis of the textual variation. We take as a case study the manuscript materials of Virginia Woolf's To the Lighthouse (1927) that bear the traces of the author's struggles in the form of deletions, additions, and rewrites. These in-text revisions typically constitute non-linear, discontinuous, or multi-hierarchical information structures. While digital technology has been instrumental in supporting manuscript research, the current data models for text provide only limited support for co-existing hierarchies or non-linear text features. The hypergraph data model of TAG is specifically designed to support and facilitate the study of complex manuscript text by way of its syntax TAGML and the collation tool HyperCollate. The article demonstrates how the study of textual variation can be augmented by computer-assisted collation that takes into account in-text, micro-level revisions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.