In this paper we propose a novel approach for manuscript dating based on shape statistics. Our goal was to develop a strategy well suited for a large scale dating effort where heterogeneous collections of thousands of manuscripts could be automatically processed. The proposed method takes the gray scale image as input, then uses the stroke width transform and a statistical model of the gradient image to find ink boundaries. Finally, a distribution over common shapes, quantified using shape context descriptors, is produced for each manuscript image. The proposed method is binarization-free, rotational invariant and requires minimal segmentation. We evaluate our work on the 10000+ manuscripts collection "Svenskt diplomatariums huvudkartotek", consisting of charters from the medieval period of todays Sweden. The images, originally intended for web viewing, were of low quality and had compression artifacts. Due to unsupervised feature learning and regression, the collection could be dated with a median absolute error below 19 years even though we only used 5% of the labels in the estimator training.
Finding the production date of a pre-modern manuscript is commonly a long process in historical research, requiring days of work from highly specialised experts. In this paper, we present an automatic dating method based on modelling both the language and the image data. By creating a statistical model over the changes in the pen strokes and short character sequences in the transcribed text, a combination of multiple estimators give a distribution over the time line for each manuscript. We have evaluated our estimation scheme on the medieval charter collection "Svenskt Diplomatariums huvudkartotek" (SDHK), including more than 5300 transcribed charters from the period 1135-1509. Our system is capable of achieving a median absolute error of 12 years, where the only human input is a transcription of the charter text. Since reading and transcribing the text is a skill that many researchers and students have, compared to the more specialized skill of dating medieval manuscripts based on palaeographical expertise, we find our novel approach suitable for helping individual researchers to date collections of manuscript pages. For larger collections, transcriptions could also be collected using crowd sourcing.
In this paper, we propose a novel pipeline for automated scribal attribution based on the Quill feature: 1) We compensate the Quill feature histogram for pen changes and page warping. 2) We add curvature as a third dimension in the feature histogram, to better separate characteristics like loops and lines. 3) We also investigate the use of several dissimilarity measures between the feature histograms. 4) We propose and evaluate semi-supervised learning for classification, to reduce the need of labeled samples. Our evaluation is performed on 1104 pages from a 15 th century Swedish manuscript. It was chosen because it represents a significant part of Swedish manuscripts of said period. Our results show that only a few percent of the material need labelling for average precisions above 95%. Our novel curvature and registration extensions, together with semisupervised learning, outperformed the current Quill feature.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.