Digital methods and collaborative research in virtual research environments are gaining in importance for the arts and humanities. The EU-funded project DARIAH aims to enhance and support digitally-enabled research across these disciplines.The most basic but nevertheless fundamental task of DARIAH is to provide sustainable storage for research data. Information contained in data like images, texts or music needs to be secured and to remain accessible even if the original information carrier becomes lost or corrupted. The heterogeneity of the humanistic data and the need for distributed, performant access are the main challenges in designing an archiving system for the arts and humanities.Using the "Virtual Scriptorium", a digitisation project in Trier, Germany, this paper exemplary identifies the humanistic researchers' storage needs and derives requirements for an infrastructure. As a solution, a generic architecture for a federated data zone based on the iRODS technologies is proposed. The system implemented in Trier and Karlsruhe is described and will be extended to other locations as the researchers benefit from the initial set-up.
Digital methods, tools and algorithms are gaining in importance for the analysis of digitized manuscript collections in the arts and humanities. One example is the BMBF-funded research project "eCodicology" which aims to design, evaluate and optimize algorithms for the automatic identification of macro-and micro-structural layout features of medieval manuscripts. The main goal of this research project is to provide better insights into high-dimensional datasets of medieval manuscripts for humanities scholars. The heterogeneous nature and size of the humanities data and the need to create a database of automatically extracted reproducible features for better statistical and visual analysis are the main challenges in designing a workflow for the arts and humanities.This paper presents a concept of a workflow for the automatic tagging of medieval manuscripts. As a starting point, the workflow uses medieval manuscripts digitized within the scope of the project "Virtual Scriptorium St. Matthias". Firstly, these digitized manuscripts are ingested into a data repository. Secondly, specific algorithms are adapted or designed for the identification of macro-and micro-structural layout elements like page size, writing space, number of lines etc. And lastly, a statistical analysis and scientific evaluation of the manuscripts groups are performed. The workflow is designed generically to process large amounts of data automatically with any desired algorithm for feature extraction. As a result, a database of objectified and reproducible features is created which helps to analyze and visualize hidden relationships of around 170,000 pages. The workflow shows the potential of automatic image analysis by enabling the processing of a single page in less than a minute. Furthermore, the accuracy tests of the workflow on a small set of manuscripts with respect to features like page size and text areas show that automatic and manual analysis are comparable. The usage of a computer cluster will allow the highly performant processing of large amounts of data. The software framework itself will be integrated as a service into the DARIAH infrastructure to make it adaptable for wider range of communities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.