2022
DOI: 10.48550/arxiv.2202.08125
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Processing the structure of documents: Logical Layout Analysis of historical newspapers in French

Abstract: Background. In recent years, libraries and archives led important digitisation campaigns that opened the access to vast collections of historical documents. While such documents are often available as XML ALTO documents, they lack information about their logical structure. In this paper, we address the problem of Logical Layout Analysis applied to historical documents in French. We propose a rule-based method, that we evaluate and compare with two Machine-Learning models, namely RIPPER and Gradient Boosting. O… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 12 publications
(12 reference statements)
0
1
0
Order By: Relevance
“…A rule-based approach for instance-level page segmentation was presented by [27]. In this system, a set of handcrafted rules is applied to classify text lines and text regions into different semantic classes such as title, first line, header, text, or other on a custom French newspaper dataset.…”
Section: Instance-level Segmentation For Page Layout Analysismentioning
confidence: 99%
“…A rule-based approach for instance-level page segmentation was presented by [27]. In this system, a set of handcrafted rules is applied to classify text lines and text regions into different semantic classes such as title, first line, header, text, or other on a custom French newspaper dataset.…”
Section: Instance-level Segmentation For Page Layout Analysismentioning
confidence: 99%