2019 International Conference on Document Analysis and Recognition (ICDAR) 2019
DOI: 10.1109/icdar.2019.00194
|View full text |Cite
|
Sign up to set email alerts
|

Labeling, Cutting, Grouping: An Efficient Text Line Segmentation Method for Medieval Manuscripts

Abstract: This paper introduces a new way for text-line extraction by integrating deep-learning based pre-classification and state-of-the-art segmentation methods. Text-line extraction in complex handwritten documents poses a significant challenge, even to the most modern computer vision algorithms. Historical manuscripts are a particularly hard class of documents as they present several forms of noise, such as degradation, bleedthrough, interlinear glosses, and elaborated scripts. In this work, we propose a novel metho… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
34
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 30 publications
(34 citation statements)
references
References 26 publications
0
34
0
Order By: Relevance
“…The authors demonstrated its genericity by successfully solving five semantic segmentation tasks on historical documents: page extraction, text line extraction, structure detection, decoration detection and photo detection. Albertini et al [3] have used the DeepDIVA framework [2] to obtain high quality semantic segmentation before extracting text-lines. Alaasam et al [1] have used siamese networks at the patch level for semantic segmentation of challenging historical Arabic manuscripts.…”
Section: Neural Network-based Strategiesmentioning
confidence: 99%
“…The authors demonstrated its genericity by successfully solving five semantic segmentation tasks on historical documents: page extraction, text line extraction, structure detection, decoration detection and photo detection. Albertini et al [3] have used the DeepDIVA framework [2] to obtain high quality semantic segmentation before extracting text-lines. Alaasam et al [1] have used siamese networks at the patch level for semantic segmentation of challenging historical Arabic manuscripts.…”
Section: Neural Network-based Strategiesmentioning
confidence: 99%
“…After semantic segmentation, we further segment the pixels classified as main text into individual text columns using seam carving, a well-established technique for text line segmentation in historical document images [13]. In this work, we use the recently introduced seam carving method proposed by Alberti et al [7], which has achieved a strong performance on several medieval manuscript datasets. The result of text column segmentation are tight polygons around the foreground pixels of the individual text columns.…”
Section: Text Column Segmentationmentioning
confidence: 99%
“…Finally, the CCs are clustered with respect to the number of seams to their right in order to form text columns and tight polygons are computed around all main text pixels. For more details on the seam carving method, we refer to [7].…”
Section: Text Column Segmentationmentioning
confidence: 99%
See 1 more Smart Citation
“…We can now process a document with both machine-printed text and handwritten text and then recognize them separately [4,5]. Similar applications can be found in the archiving and processing of historical documents [6,7]. In the field of education, related technologies for examination paper autoscoring have emerged, which greatly reduce burden for teachers and students.…”
Section: Introductionmentioning
confidence: 99%