Purpose
An overview of the current use of handwritten text recognition (HTR) on archival manuscript material, as provided by the EU H2020 funded Transkribus platform. It explains HTR, demonstrates Transkribus, gives examples of use cases, highlights the affect HTR may have on scholarship, and evidences this turning point of the advanced use of digitised heritage content. The paper aims to discuss these issues.
Design/methodology/approach
This paper adopts a case study approach, using the development and delivery of the one openly available HTR platform for manuscript material.
Findings
Transkribus has demonstrated that HTR is now a useable technology that can be employed in conjunction with mass digitisation to generate accurate transcripts of archival material. Use cases are demonstrated, and a cooperative model is suggested as a way to ensure sustainability and scaling of the platform. However, funding and resourcing issues are identified.
Research limitations/implications
The paper presents results from projects: further user studies could be undertaken involving interviews, surveys, etc.
Practical implications
Only HTR provided via Transkribus is covered: however, this is the only publicly available platform for HTR on individual collections of historical documents at time of writing and it represents the current state-of-the-art in this field.
Social implications
The increased access to information contained within historical texts has the potential to be transformational for both institutions and individuals.
Originality/value
This is the first published overview of how HTR is used by a wide archival studies community, reporting and showcasing current application of handwriting technology in the cultural heritage sector.
Intangible Cultural Heritage (ICH) witnesses human creativity and wisdom in long histories, composed of a variety of immaterial manifestations. The rapid development of digital technologies accelerates the record of ICH, generating a sheer number of heterogenous data but in a state of fragmentation. To resolve that, existing studies mainly adopt approaches of knowledge graphs (KGs) which can provide rich knowledge representation. However, most KGs are text-based and text-derived, and incapable to give related images and empower downstream multimodal tasks, which is also unbeneficial for the public to establish the visual perception and comprehend ICH completely especially when they do not have the related ICH knowledge. Hence, aimed at that, we propose to, taking the Chinese nation-level ICH list as an example, construct a large-scale and comprehensive Multimodal Knowledge Graph (CICHMKG) combining text and image entities from multiple data sources and give a practical construction framework. Additionally, in this paper, to select representative images for ICH entities, we propose a method composed of the denoising algorithm (CNIFA) and a series of criteria, utilizing global and local visual features of images and textual features of captions. Extensive empirical experiments demonstrate its effectiveness. Lastly, we construct the CICHMKG, consisting of 1,774,005 triples, and visualize it to facilitate the interactions and help the public dive into ICH deeply.
Existing text recognition engines enables to train general models to recognize not only one specific hand but a multitude of historical hands within a particular script, and from a rather large time period (more than 100 years). This paper compares different text recognition engines and their performance on a test set independent of the training and validation sets. We argue that both, test set and ground truth, should be made available by researchers as part of a shared task to allow for the comparison of engines. This will give insight into the range of possible options for institutions in need of recognition models. As a test set, we provide a data set consisting of 2,426 lines which have been randomly selected from meeting minutes of the Swiss Federal Council from 1848 to 1903. To our knowledge, neither the aforementioned text lines, which we take as ground truth, nor the multitude of different hands within this corpus have ever been used to train handwritten text recognition models. In addition, the data set used is perfect for making comparisons involving recognition engines and large training sets due to its variability and the time frame it spans. Consequently, this paper argues that both the tested engines, HTR+ and PyLaia, can handle large training sets. The resulting models have yielded very good results on a test set consisting of unknown but stylistically similar hands.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.