Proceedings of the 2012 ACM Symposium on Document Engineering 2012
DOI: 10.1145/2361354.2361383
|View full text |Cite
|
Sign up to set email alerts
|

Logical segmentation for article extraction in digitized old newspapers

Abstract: Newspapers are documents made of news item and informative articles. They are not meant to be read iteratively: the reader can pick his items in any order he fancies. Ignoring this structural property, most digitized newspaper archives only offer access by issue or at best by page to their content.We have built a digitization workflow that automatically extracts newspaper articles from images, which allows indexing and retrieval of information at the article level. Our back-end system extracts the logical stru… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(2 citation statements)
references
References 6 publications
0
2
0
Order By: Relevance
“…However, there are many deep learning models that address aspects of document analysis and recognition tasks that we can leverage in our research. There exist deep learning models for detecting tables in documents [19]- [21], mathematical formula detection and recognition [22]- [24], document structure detection systems [25]- [28], and more. The technical details of our PDF remediation method are presented in Chapter 4 and the evaluation of our methods in Chapter 5.1.…”
Section: Deep Learning For Pdf Remediationmentioning
confidence: 99%
“…However, there are many deep learning models that address aspects of document analysis and recognition tasks that we can leverage in our research. There exist deep learning models for detecting tables in documents [19]- [21], mathematical formula detection and recognition [22]- [24], document structure detection systems [25]- [28], and more. The technical details of our PDF remediation method are presented in Chapter 4 and the evaluation of our methods in Chapter 5.1.…”
Section: Deep Learning For Pdf Remediationmentioning
confidence: 99%
“…Palfray et al [4] focus on the challenge of digitizing antique newspapers. Their approach not only performs segmentation but also extracts the reading order.…”
Section: Related Workmentioning
confidence: 99%