2014
DOI: 10.1045/november14-tkaczyk
|View full text |Cite
|
Sign up to set email alerts
|

GROTOAP2 — The Methodology of Creating a Large Ground Truth Dataset of Scientific Articles

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
13
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 20 publications
(14 citation statements)
references
References 0 publications
1
13
0
Order By: Relevance
“…Existing region-labeled datasets of scientific articles proved too noisy for our visual region detection method. Training the model with the full GROTOAP2 dataset (Tkaczyk et al, 2014), for example, yielded a best overall detection performance of 5.1% mean average precision (mAP) over all 22 labels. The key problems were that many of the regions were far too granular (e.g.…”
Section: Novel Labeled Datasetmentioning
confidence: 99%
“…Existing region-labeled datasets of scientific articles proved too noisy for our visual region detection method. Training the model with the full GROTOAP2 dataset (Tkaczyk et al, 2014), for example, yielded a best overall detection performance of 5.1% mean average precision (mAP) over all 22 labels. The key problems were that many of the regions were far too granular (e.g.…”
Section: Novel Labeled Datasetmentioning
confidence: 99%
“…Table 3 provides a comparison of the DocBank to the previous document layout analysis datasets, including Article Regions (Soto and Yoo, 2019), GROTOAP2 (Tkaczyk et al, 2014), PubLayNet (Zhong et al, 2019), and TableBank (Li et al, 2019 19.36% Total 400,000 100.00% 50,000 100.00% 50,000 100.00% 500,000 100.00%…”
Section: Dataset Statisticsmentioning
confidence: 99%
“…There are few exceptions to this rule, for example, Arabic script, and such cases would not be handled properly by the algorithm. This observation is reflected in the distances counted for all zone pairs: the distance is calculated using the angle of the slope of the vector connecting 4 An example page from a scientific publication. The image shows the zones and their reading order zones.…”
Section: Reading Order Resolvingmentioning
confidence: 99%
“…As a result, PMC could not be directly used for training and evaluation of the individual steps, such as page segmentation and zone classification. For these tasks, we built GROTOAP [38] and GROTOAP2 [4] datasets.…”
Section: Datasets Preparationmentioning
confidence: 99%
See 1 more Smart Citation