2021
DOI: 10.1007/978-3-030-86331-9_32
|View full text |Cite
|
Sign up to set email alerts
|

Page Layout Analysis System for Unconstrained Historic Documents

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 14 publications
(4 citation statements)
references
References 22 publications
0
4
0
Order By: Relevance
“…(Gupta et al, 2015). All of this makes optical character recognition (OCR) difficult and motivates research into the pre-processing of images to eliminate noise (Neji et al, 2024), or the training of character recognition models specialized in Gothic and round letters using machine learning based on neural networks (Lacasta et al, 2022;Kodym et al, 2021).…”
Section: State Of the Artmentioning
confidence: 99%
“…(Gupta et al, 2015). All of this makes optical character recognition (OCR) difficult and motivates research into the pre-processing of images to eliminate noise (Neji et al, 2024), or the training of character recognition models specialized in Gothic and round letters using machine learning based on neural networks (Lacasta et al, 2022;Kodym et al, 2021).…”
Section: State Of the Artmentioning
confidence: 99%
“…The documents contain 9 languages and 10 font families. We randomly sampled 1.3M text line images from the original collection 5ehJ bGk E p270bH 0njh iFQ0Sk 2zEqWuQ jbqp wE Mvxc5 hzrb8 zm1wogV with ParseNet [18] and we translated their transcriptions into additional nine synthetic TS. Thus, each text line has ten transcription versions where one is the original one.…”
Section: Synthetic Transcription Styles Experimentsmentioning
confidence: 99%
“…We represent a document by a single TSI, as we assume that each document was transcribed by a single organization using a consistent transcription style, but we have no information on how or where the documents were transcribed. We randomly sampled 1.08M text lines from 2445 randomly picked DTA documents with ParseNet [18]. The number of lines per document ranges from 200 to 3029.…”
Section: Real Transcription Styles Experimentsmentioning
confidence: 99%
“…Several research explorations were carried out on complex documents [2,11], whereas some others focused on more complex historical documents, which may implicate challenging handwritten manuscripts in different languages [12][13][14]. Moreover, DLA can be an even more challenging task when applied to historical handwritten documents with highly unconstrained structure and complex page layouts [15,16], as in ancient/historical Arabic manuscripts [17][18][19].…”
Section: Introductionmentioning
confidence: 99%