2005
DOI: 10.1016/j.datak.2004.10.004
|View full text |Cite
|
Sign up to set email alerts
|

Automating the extraction of data from HTML tables with unknown structure

Abstract: Data on the Web in HTML tables is mostly structured, but we usually do not know the structure in advance. Thus, we cannot directly query for data of interest. We propose a solution to this problem based on document-independent extraction ontologies. Our solution entails elements of table understanding, data integration, and wrapper creation. Table understanding allows us to find tables of interest within a Web page, recognize attributes and values within the table, pair attributes with values, and form records… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
41
0
4

Year Published

2006
2006
2016
2016

Publication Types

Select...
6
3

Relationship

4
5

Authors

Journals

citations
Cited by 58 publications
(45 citation statements)
references
References 13 publications
0
41
0
4
Order By: Relevance
“…Ada juga peneliti yang melakukan integrasi tabel web dengan struktur data yang tidak diketahui, khususnya untuk tabel column wise dan row wise [7]. Mereka mendefinisikan beberapa operator untuk mengatasi masalah struktural yang terkait dengan perbedaan skema, yaitu merged attribute, attribute as value, dan atribut subset.…”
Section: Tabel III Contoh Mixed-cellunclassified
“…Ada juga peneliti yang melakukan integrasi tabel web dengan struktur data yang tidak diketahui, khususnya untuk tabel column wise dan row wise [7]. Mereka mendefinisikan beberapa operator untuk mengatasi masalah struktural yang terkait dengan perbedaan skema, yaitu merged attribute, attribute as value, dan atribut subset.…”
Section: Tabel III Contoh Mixed-cellunclassified
“…It is possible, however, that such a knowledge base can be assembled from studying a large collection of diverse but related tables. This, in fact, is one of our long term research objectives [9][10][11].…”
Section: External Informationmentioning
confidence: 99%
“…We compiled a comprehensive survey of table processing for IJDAR in 2006 [55]. Input tables were matched with known conceptualizations in an attempt to interpret them in [56]. Information extraction from sibling tables with identical headers was demonstrated in [57].…”
Section: Our Earlier Workmentioning
confidence: 99%