Proceedings of the Eighth International Conference on Information and Knowledge Management 1999
DOI: 10.1145/319950.320052
|View full text |Cite
|
Sign up to set email alerts
|

An automated approach for retrieving hierarchical data from HTML tables

Abstract: Among the HTML elements, HTML tables [RHJ98] encapsulate hierarchically structured data (hierarchical data in short) in a tabular structure.HTML tables do not come with a rigid schema and almost any forms of two-dimensional tables are acceptable according to the HTML grammar. This relaxation complicates the process of retrieving hierarchical data from HTML tables. In this paper, we propose an automated approach for retrieving hierarchical data from HTML tables. The proposed approach constructs the content tree… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
24
0
1

Year Published

2000
2000
2009
2009

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 39 publications
(25 citation statements)
references
References 6 publications
0
24
0
1
Order By: Relevance
“…Appropriate markup can be used to assign logical structure arrangement to table cells [13], while navigation can be improved by additional markup annotation to add context to existing tables [14]. Other suggestions include automated approaches for retrieval of hierarchical data from HTML tables [15]. Smart browsers are used to access critical information for use in reading tables as well as linearization techniques are employed for transforming tables into a more easily readable form by screen readers [16].…”
Section: Problem Formulationmentioning
confidence: 99%
“…Appropriate markup can be used to assign logical structure arrangement to table cells [13], while navigation can be improved by additional markup annotation to add context to existing tables [14]. Other suggestions include automated approaches for retrieval of hierarchical data from HTML tables [15]. Smart browsers are used to access critical information for use in reading tables as well as linearization techniques are employed for transforming tables into a more easily readable form by screen readers [16].…”
Section: Problem Formulationmentioning
confidence: 99%
“…6 We recognize that [LN99a] had an earlier solution for a much smaller subclass of HTML tables. We also recognize that there is a larger class of HTML tables and an even larger class of tables in general [LN99b].…”
Section: Form Attribute-value Pairsmentioning
confidence: 99%
“…However, most of the previous works achieve adaptation only under some special conditions due to the lack of structural information. Some works tried to extract semantic structural information from HTML tag either manually [6] - [9] or automatically [10] [11]. But these approaches lack an overview of the whole website.…”
Section: Qiu Fengwumentioning
confidence: 99%