Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage 2017
DOI: 10.1145/3078081.3078098
|View full text |Cite
|
Sign up to set email alerts
|

Case Study of a highly automated Layout Analysis and OCR of an incunabulum

Abstract: This paper provides the first thorough documentation of a high quality digitization process applied to an early printed book from the incunabulum period (1450-1500). The entire OCR related workflow including preprocessing, layout analysis and text recognition is illustrated in detail using the example of 'Der Heiligen Leben', printed in Nuremberg in 1488. For each step the required time expenditure was recorded. The character recognition yielded excellent results both on character (97.57%) and word (92.19%) le… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 11 publications
(13 citation statements)
references
References 7 publications
0
13
0
Order By: Relevance
“…A number of research groups have invested significant efforts in the creation and maintenance of annotated, publicly available historical manuscript image datasets [1]- [4], [11]- [13]. Other collections contain character-level and wordlevel spatial annotations for South-East Asian palm-leaf manuscripts [5], [10], [14].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…A number of research groups have invested significant efforts in the creation and maintenance of annotated, publicly available historical manuscript image datasets [1]- [4], [11]- [13]. Other collections contain character-level and wordlevel spatial annotations for South-East Asian palm-leaf manuscripts [5], [10], [14].…”
Section: Related Workmentioning
confidence: 99%
“…The collection and analysis of historical document images is a key component in the preservation of culture and heritage. Given its importance, a number of active research efforts exist across the world [1]- [6]. In this paper, we focus on palm-leaf and early paper documents from the Indian subcontinent.…”
Section: Introductionmentioning
confidence: 99%
“…The experiment was conducted by following the workflow and example described in section III. Table IV shows the CER achieved on a fixed evaluation set of previously unseen lines from the held-out data of each individual best model (1)(2)(3)(4)(5) and the combined results without (ISRI Voting) and with (Confidence Voting) confidence information. Furthermore, the relative improvement of the combined result with respect to the best/average/worst model is indicated.…”
Section: B Default Application (5 Folds 150 Lines)mentioning
confidence: 99%
“…32-45. ISSN: 2296-0597 analysis (Reul, Dittrich, and Gruner 2017). 1476 is part of the Early New High German Reference Corpus 11 and 1572 was digitized in order to be added to the AL-Corpus 12 .…”
Section: Booksmentioning
confidence: 99%
“…Starting from Breuel et al (2013)'s groundbreaking paper the application of recurrent neural networks with LSTM architecture to the field of OCR of historical printings has made excellent progress (Springmann, Fink, and Schulz 2016;Springmann and Lüdeling 2017;Reul, Dittrich, and Gruner 2017), although it was previously considered nearly impossible for the case of incunabula 1 (Rydberg-Cox 2009). Character accuracy rates (CERs) in the high nineties are now routinely possible for even the earliest printings.…”
Section: Introductionmentioning
confidence: 99%