2022
DOI: 10.2352/issn.2168-3204.2022.19.1.4
|View full text |Cite
|
Sign up to set email alerts
|

Handwritten and Printed Text Identification in Historical Archival Documents

Abstract: Historical archival records present many challenges for OCR systems to correctly encode their content, due to visual complexity, e.g. mixed printed text and handwritten annotations, paper degradation and faded ink. This paper addresses the problem of automatic identification and separation of handwritten and printed text in historical archival documents, including the creation of an artificial pixel-level annotated dataset and the presentation of a new FCN-based model trained on historical data. Initial test r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(12 citation statements)
references
References 15 publications
0
0
0
Order By: Relevance
“…When microfilming, original records are scaled down and decolourized, which causes information loss and may influence the performance of the text separation system. In [1], we addressed the problem of automatic identification of handwritten and printed text in historical archival documents, in the context of the "Pilotprojekt zur Transformation der Wiedergutmachung" 11 [Pilot Project for Transformation of Reparations]. This project is focused on accessing knowledge hidden in historical records related to the claims of compensation and compensation proceedings that were submitted after 1945, in the state of Baden-Württemberg, Germany.…”
Section: Motivation and Problem Statementmentioning
confidence: 99%
See 4 more Smart Citations
“…When microfilming, original records are scaled down and decolourized, which causes information loss and may influence the performance of the text separation system. In [1], we addressed the problem of automatic identification of handwritten and printed text in historical archival documents, in the context of the "Pilotprojekt zur Transformation der Wiedergutmachung" 11 [Pilot Project for Transformation of Reparations]. This project is focused on accessing knowledge hidden in historical records related to the claims of compensation and compensation proceedings that were submitted after 1945, in the state of Baden-Württemberg, Germany.…”
Section: Motivation and Problem Statementmentioning
confidence: 99%
“…The records hold crucial information pertaining to one of the darkest periods in German history, offering first-hand accounts of atrocities committed against individuals who were persecuted and discriminated against on the basis of their ethnicity, religion, political beliefs, and sexual orientation during the National Socialist regime. Our contributions in [1] included creation of WGM-SYN, an artificial pixel-level ground-truth dataset made from historical data, and WGM-MOD, a model trained on this dataset. The data synthesis pipeline in [1] is initiated by denoising and binarising homogenous text regions extracted from original archival documents.…”
Section: Motivation and Problem Statementmentioning
confidence: 99%
See 3 more Smart Citations