2020
DOI: 10.48550/arxiv.2012.02544
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Boosting offline handwritten text recognition in historical documents with few labeled lines

José Carlos Aradillas,
Juan José Murillo-Fuentes,
Pablo M. Olmos

Abstract: In this paper, we face the problem of offline handwritten text recognition (HTR) in historical documents when few labeled samples are available and some of them contain errors in the train set. Three main contributions are developed. First we analyze how to perform transfer learning (TL) from a massive database to a smaller historical database, analyzing which layers of the model need a finetuning process. Second, we analyze methods to efficiently combine TL and data augmentation (DA). Finally, an algorithm to… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

1
0
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 31 publications
(62 reference statements)
1
0
0
Order By: Relevance
“…With only 4 annotated pages (3 for training and 1 for validation), we reach an average CER of 5.11%. Then, the more we annotate data of a target dataset, the less improvement we get by new annotated lines which confirms the results obtained in [2]. The next challenge to address will be to minimize the number of new lines needed to benefit from writer specialization.…”
Section: Results Of the Optical Modelsupporting
confidence: 71%
“…With only 4 annotated pages (3 for training and 1 for validation), we reach an average CER of 5.11%. Then, the more we annotate data of a target dataset, the less improvement we get by new annotated lines which confirms the results obtained in [2]. The next challenge to address will be to minimize the number of new lines needed to benefit from writer specialization.…”
Section: Results Of the Optical Modelsupporting
confidence: 71%