Abstract:Purpose -Preservation environments such as repositories need scalable and context-aware preservation planning and monitoring capabilities to ensure continued accessibility of content over time. This article identifies a number of gaps in the systems and mechanisms currently available, and presents a new, innovative architecture for scalable decision making and control in such environments.Design/methodology/approach -The paper illustrates the state of the art in preservation planning and monitoring, highlights the key challenges faced by repositories to provide scalable decision making and monitoring facilities, and presents the contributions of the SCAPE Planning and Watch suite to provide such capabilities.Findings -The presented architecture makes preservation planning and monitoring context-aware through a semantic representation of key organizational factors, and integrates this with a business intelligence system that collects and reasons upon preservation-relevant information.Research limitations/implications -The architecture has been implemented in the SCAPE Planning and Watch suite. Integration with repositories and external information sources provide powerful preservation capabilities that can be freely integrated with virtually any repository.Practical implications -The open nature of the software suite enables stewardship organizations to integrate the components with their own preservation environments and to contribute to the ongoing improvement of the systems.Originality/value -The paper reports on innovative research and development to provide preservation capabilities. The results enable proactive, continuous preservation management through a context-aware planning and monitoring cycle integrated with operational systems.
Structured Abstract:Purpose -Scalable decision support and business intelligence capabilities are required to effectively secure content over time. This article evaluates a new architecture for scalable decision making and control in preservation environments for its ability to address five key goals: (1) Scalable content profiling, (2) Monitoring of compliance, risks and opportunities, (3) Efficient creation of trustworthy plans, (4) Context awareness, and (5) Loosely-coupled preservation ecosystems.Design/methodology/approach -We conduct a systematic evaluation of the contributions of the SCAPE Planning and Watch suite to provide effective and scalable decision support capabilities. We discuss the quantitative and qualitative evaluation of advancing the state of art and report on a case study with a national library.Findings -The system provides substantial capabilities for semi-automated, scalable decision making and control of preservation functions in repositories. Well-defined interfaces allow a flexible integration with diverse institutional environments. The free and open nature of the tool suite further encourages global take-up in the repository communities.Research limitations/implications -The article discusses a number of bottlenecks and factors limiting the realworld scalability of preservation environments. This includes data-intensive processing of large volumes of information, automated quality assurance for preservation actions, and the element of human decision making. We outline open issues and future work.Practical implications -The open nature of the software suite enables stewardship organizations to integrate the components with their own preservation environments and to contribute to the ongoing improvement of the systems.Originality/value -The paper reports on innovative research and development to provide preservation capabilities. The results of the assessment demonstrate how the system advances the control of digital preservation operations from ad-hoc decision making to proactive, continuous preservation management, through a contextaware planning and monitoring cycle integrated with operational systems. Keywords:Repositories, preservation planning, preservation watch, monitoring, scalability, digital libraries. Scalable Decision Support for Digital Preservation: An AssessmentThis article presents a systematic assessment and evaluation of the SCAPE decision support environment comprising PLATO, SCOUT and c3po. We discuss the improvements and identified limitations of the presented system. We furthermore discuss the quantitative and qualitative evaluation of advancing the state of art and report on a case study with a national library. Finally, we summarize the contributions and provide an outlook on future work.
Successful preservation of content requires sophisticated mechanisms for collecting, tracking and analyzing information about a multitude of relevant aspects. This is not limited to content itself, but also tracking of available software, other organization's content, usage statistics and trends, format risks, systems operations and many more. Such tracking requires a flexible system that supports evolution over time and provides an extensible platform for scalability. This article presents a novel approach towards automated monitoring of preservation-related information. We discuss the challenges and information sources that need to be covered and outline the key design features of a novel preservation watch system. We discuss how this system addresses critical information needs for informed preservation management and outline next steps ahead.
Text extraction plays an important function for data processing work ows in digital libraries. For example, it is a crucial prerequisite for evaluating the quality of migrated textual documents. Complex le formats make the extraction process error-prone and have made it very challenging to verify the correctness of extraction components. Based on digital preservation and information retrieval scenarios, three quality requirements in terms of e ectiveness of text extraction tools are identi ed: 1) is a certain text snippet correctly extracted from a document, 2) does the extracted text appear in the right order relative to other elements and, 3) is the structure of the text preserved. A number of text extraction tools is available ful lling these three quality requirements to various degrees. However, systematic benchmarks to evaluate those tools are still missing, mainly due to the lack of datasets with accompanying ground truth. e contribution of this paper is twofold. First we describe a dataset generation method based on model driven engineering principles and use it to synthesize a dataset and its ground truth directly from a model. Second, we de ne a benchmark for text extraction tools and complete an experiment to calculate performance measures for several tools that cover the three quality requirements. e results demonstrate the bene ts of the approach in terms of scalability and e ectiveness in generating ground truth for content and structure of text elements.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.