2020
DOI: 10.3390/jimaging6100109
|View full text |Cite
|
Sign up to set email alerts
|

One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document

Abstract: Digital libraries offer access to a large number of handwritten historical documents. These documents are available as raw images and therefore their content is not searchable. A fully manual transcription is time-consuming and expensive while a fully automatic transcription is cheaper but not comparable in terms of accuracy. The performance of automatic transcription systems is strictly related to the composition of the training set. We propose a multi-step procedure that exploits a Keyword Spotting system an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 46 publications
0
2
0
Order By: Relevance
“…However, the digital transcript is not always linked to the manuscript image, making it difficult to locate the parts of the image that correspond to a particular part of the transcript. Thus, tools for transcript alignment, i.e., the automatic linking of the digital transcription with the parts of the digital image of the manuscript where it appears, would be of great help to scholars and historians who have to work with different versions of documents and/or writing styles, as well as for engineering a semi-automatic procedure for building the training set with minimal annotation, as suggested, for instance, in [ 8 ].…”
Section: Introductionmentioning
confidence: 99%
“…However, the digital transcript is not always linked to the manuscript image, making it difficult to locate the parts of the image that correspond to a particular part of the transcript. Thus, tools for transcript alignment, i.e., the automatic linking of the digital transcription with the parts of the digital image of the manuscript where it appears, would be of great help to scholars and historians who have to work with different versions of documents and/or writing styles, as well as for engineering a semi-automatic procedure for building the training set with minimal annotation, as suggested, for instance, in [ 8 ].…”
Section: Introductionmentioning
confidence: 99%
“…Due to the rapid digitisation of archives, many projects have investigated the best practices to develop HTR models capable of automatically transcribing large collections of historical manuscripts (such as [17][18][19][20][21][22]). Most of these models prove to be very per-formative, having been trained on large corpora of coherent documents.…”
mentioning
confidence: 99%