In the problem space of long-term preservation of digital objects, the disciplined use of XML affords a reasonable solution to many of the issues associated with ensuring the interpretability and renderability of at least some digital artifacts. This paper describes the experience of Portico, a digital preservation service that preserves scholarly literature in electronic form. It describes some of the challenges and practices entailed in processing and producing XML for the archive, including issues of syntax, semantics, linking, versioning, and prospective issues of scale, variety of formats, and the larger infrastructure of tools and practices required for the use of XML for the long haul.
The Digital Curation Manual is licensed under a Creative Commons Attribution -Non-Commercial -Share-Alike 2.0 License. © in the collective work -Digital Curation Centre (which in the context of these notices shall mean one or more of the University of Edinburgh, the University of Glasgow, the University of Bath, the Council for the Central Laboratory of the Research Councils and the staff and agents of these parties involved in the work of the Digital Curation Centre), 2005.© in the individual instalments -the author of the instalment or their employer where relevant (as indicated in catalogue entry below).The Digital Curation Centre confirms that the owners of copyright in the individual instalments have given permission for their work to be licensed under the Creative Commons license.
Catalogue Entry
DescriptionThe goal of digital curation is to ensure the appropriate usability of managed digital assets over time. Format is a fundamental characteristic of a digital asset that governs its ability to be used effectively. Without strong format typing a digital asset is merely an undifferentiated string of bits. The information content encoded into an asset's bits can only be interpreted properly and rendered in human-sensible form if that asset's format is known. While it is possible for bits to be preserved indefinitely without consideration of format, it is only through the careful management of format that the meaning of those bits remains accessible over time. This instalment investigates aspects of format description, validation, and characterisation that may assist with long-term curation and usability of data.
Publisher
Citation GuidelinesStephen Abrams, (October 2007), "File Formats", DCC Digital Curation Manual, S.Ross, M.Day (eds), Retrieved
Institutions such as Portico that are engaged in ensuring that the digital record of our time is accessible, usable, discoverable, and verifiable for the very long term continually face the challenge of processing and managing content at very large scales, often with minimal, and sometimes diminishing, resources to accomplish the task. A key resource in meeting the challenge of preserving born-digital and digitized scholarly literature has been the NLM and JATS standards, and the community of practice centered on those standards. We will be talking about our shared experience in developing those standards: what motivated our participation, what benefits we have seen, and what challenges we still face.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.