2019 International Conference on Document Analysis and Recognition (ICDAR) 2019
DOI: 10.1109/icdar.2019.00166
|View full text |Cite
|
Sign up to set email alerts
|

PubLayNet: Largest Dataset Ever for Document Layout Analysis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
264
1
10

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 369 publications
(277 citation statements)
references
References 13 publications
2
264
1
10
Order By: Relevance
“…More on this in sections "Investment documents datasets" and "Title hierarchization" 7. an exhaustive study reporting on the usage of prospectuses confirms this: https ://morec arrot .com/wp-conte nt/uploa ds/2019/10/MC_ Prosp ectus _Study Repor tFina l_23oct 19.pdf with MS Word used in 92% of the cases 8. for prospectuses, see https ://www.amf-franc e.org/en_US/Formu laire s-et-decla ratio ns/OPCVM -et-fonds -d-inves tisse ment/OPCVM / Plan-type-du-prosp ectus 0.…”
mentioning
confidence: 78%
See 2 more Smart Citations
“…More on this in sections "Investment documents datasets" and "Title hierarchization" 7. an exhaustive study reporting on the usage of prospectuses confirms this: https ://morec arrot .com/wp-conte nt/uploa ds/2019/10/MC_ Prosp ectus _Study Repor tFina l_23oct 19.pdf with MS Word used in 92% of the cases 8. for prospectuses, see https ://www.amf-franc e.org/en_US/Formu laire s-et-decla ratio ns/OPCVM -et-fonds -d-inves tisse ment/OPCVM / Plan-type-du-prosp ectus 0.…”
mentioning
confidence: 78%
“…More recently, [8] released the largest dataset to benchmark layout structure extraction algorithms, called Pub-LayNet. With over 1 million scientific articles from the ArXiv.org website, the authors train a neural network on a page segmentation task to detect text, titles, lists, tables and figures from images.…”
Section: Text and Layout Extraction From Imagesmentioning
confidence: 99%
See 1 more Smart Citation
“…Page segmentation. Also called document layout analysis, page segmentation is an active research area with numerous competitions [1]- [4] and datasets [5]- [8]. They usually consider many semantic categories (e.g., caption, paragraph, title) and split text regions at paragraph level.…”
Section: Related Workmentioning
confidence: 99%
“…Training deep networks for both page and text line segmentation requires large amounts of data. For modern documents, Yang et al [7] and Zhong et al [8] proposed synthetic document generation engines based on modern formats (respectively Latex and PDF) yielding to large-scale and heterogeneous document datasets. However, these documents are too simple to train a model that perform well on historical documents.…”
Section: Related Workmentioning
confidence: 99%