2022
DOI: 10.21248/jlcl.35.2022.232
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing the Training of Models for Automated Post-Correction of Arbitrary OCR-ed Historical Texts

Abstract: Systems for post-correction of OCR-results for historical texts are based on statistical correction models obtained by supervised learning. For training, suitable collections of ground truth materials are needed. In this paper we investigate the dependency of the power of automated OCR post-correction on the form of ground truth data and other training settings used for the computation of a post-correction model. The post-correction system A-PoCoTo considered here is based on a profiler service that computes a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 13 publications
0
1
0
Order By: Relevance
“…Evaluation Metrics and Benchmarks [34]: Establishing appropriate evaluation metrics and benchmarks for multilingual OCR systems [35] is vital for assessing performance, identifying areas for improvement, and facilitating model comparison. These metrics should include character-and word-level recognition rates, language identification accuracy, and domain-specific evaluation measures.…”
mentioning
confidence: 99%
“…Evaluation Metrics and Benchmarks [34]: Establishing appropriate evaluation metrics and benchmarks for multilingual OCR systems [35] is vital for assessing performance, identifying areas for improvement, and facilitating model comparison. These metrics should include character-and word-level recognition rates, language identification accuracy, and domain-specific evaluation measures.…”
mentioning
confidence: 99%