2009 10th International Conference on Document Analysis and Recognition 2009
DOI: 10.1109/icdar.2009.12
|View full text |Cite
|
Sign up to set email alerts
|

PDF-TREX: An Approach for Recognizing and Extracting Tables from PDF Documents

Abstract: This paper presents PDF-TREX, an

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
33
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 79 publications
(33 citation statements)
references
References 8 publications
0
33
0
Order By: Relevance
“…In particular, the problem of table understanding was split up into three tasks: 1) Many approaches to table understanding, e.g. [2], [3], have been designed to work on object-based documents as input and therefore cannot be evaluated using datasets consisting solely of page images. By choosing born-digital PDF as the format for the competition dataset, we have made it possible for such approaches to participate in the competition, as well as those based on raster images and plain text documents.…”
Section: Introductionmentioning
confidence: 99%
“…In particular, the problem of table understanding was split up into three tasks: 1) Many approaches to table understanding, e.g. [2], [3], have been designed to work on object-based documents as input and therefore cannot be evaluated using datasets consisting solely of page images. By choosing born-digital PDF as the format for the competition dataset, we have made it possible for such approaches to participate in the competition, as well as those based on raster images and plain text documents.…”
Section: Introductionmentioning
confidence: 99%
“…Some heuristics on content elements can also be built in the same way they could be perceived by a human reader. Thus, the content elements can be aligned and grouped in a bottomup way to exploit spatial relationship among them and build lines, blocks, rows, and the final cell grid [38,39].…”
Section: Structural Methods For Table Localizationmentioning
confidence: 99%
“…The use of heuristics can produce a whole algorithm for the grid construction, like in the PDF-TREX approach [39]. First, rows are built using the horizontal alignment of content elements.…”
Section: Using Geometric Informationmentioning
confidence: 99%
“…PDF-TREX is a heuristic approach for table recognition and extraction from PDF documents [39]. The heuristic aligns and groups, in a bottom-up way, content elements by exploiting only the relationships existing among them.…”
Section: Consolidated Systems and Softwarementioning
confidence: 99%