Proceedings of the 18th Conference on Computational Linguistics - 2000
DOI: 10.3115/990820.990845
|View full text |Cite
|
Sign up to set email alerts
|

Mining tables from large scale HTML texts

Abstract: Table is a very common presentation scheme, but few papers touch on table extraction in text data mining. This paper l'ocuscs on mining tables from large-scale HTML texts. Table filtering, recognition, interpretation, and presentation arc discussed. Heuristic rules and cell similarities arc employed to identify tables. The F-measure ot' table recognition is 86.50%. We also propose an algorithm to capture attribute-value relationships alnong table cells. Finally, more structured data is extracted and presented. Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
98
0
1

Year Published

2006
2006
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 126 publications
(99 citation statements)
references
References 4 publications
0
98
0
1
Order By: Relevance
“…For the initial two sibling pages, we tested (1) whether TISP was able to recognize HTML data tables and discard HTML tables used only for layout, (2) whether it was able to pair all sibling tables correctly, and (3) whether it was able to recognize the correct pattern template or pattern combination. For the rest of sibling pages from the same web site, we tested (1) whether TISP was able to interpret tables using the recognized structure patterns, (2) whether it correctly detected the need for dynamic adjustment, and (3) whether it recognized new structure patterns correctly.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…For the initial two sibling pages, we tested (1) whether TISP was able to recognize HTML data tables and discard HTML tables used only for layout, (2) whether it was able to pair all sibling tables correctly, and (3) whether it was able to recognize the correct pattern template or pattern combination. For the rest of sibling pages from the same web site, we tested (1) whether TISP was able to interpret tables using the recognized structure patterns, (2) whether it correctly detected the need for dynamic adjustment, and (3) whether it recognized new structure patterns correctly.…”
Section: Resultsmentioning
confidence: 99%
“…Other table interpretation systems work based on some simple assumptions and heuristics (e.g. [2,6]). These simple assumptions (labels are either the first row or the first column) are easily broken in complex tables.…”
Section: Introductionmentioning
confidence: 99%
“…Recently due to the popularity of web pages, detection and analysis of tables in HTML documents get a lot of attention (Wang & Hu, 2002;Chen, Tsai, & Tsai, 2000). HTML provides table tags which often help detect and segment tables, but offers little help on semantic analysis.…”
Section: Table Analysismentioning
confidence: 99%
“…Alaaeldin Hafez, Jitender Deogun, and Vijay V. Raghavan [7] propose the Item-Set Tree: A Data Structure for Data Mining. Chen, et al tried to extract tables from ASCII text [2]. Penn, et al attempted to reformat existing web information for handheld devices [4].…”
Section: Literature Reviewmentioning
confidence: 99%
“…Gatterbauer, et al attempted to discover tabular structure without the HTML table tag, through cues such as onscreen data placement [3]. Chen, et al tried to extract tables from ASCII text [2]. Penn, et al attempted to reformat existing web information for handheld devices [4].…”
Section: Literature Reviewmentioning
confidence: 99%