Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2003
DOI: 10.1145/956750.956826
|View full text |Cite
|
Sign up to set email alerts
|

Mining data records in Web pages

Abstract: A large amount of information on the Web is contained in regularly structured objects, which we call data records. Such data records are important because they often present the essential information of their host pages, e.g., lists of products or services. It is useful to mine such data records in order to extract information from them to provide value-added services. Existing automatic techniques are not satisfactory because of their poor accuracies. In this paper, we propose a more effective technique to pe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
208
0
1

Year Published

2005
2005
2020
2020

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 319 publications
(209 citation statements)
references
References 11 publications
0
208
0
1
Order By: Relevance
“…They focus on two things those are, Data records recognition from the query page and next is arrange these extracted data in a table. Robert Grossman, YanhongZhai, Bing Liu [4], mainly focused on the data record which contains the large amount of information on the web. Data records also contain the information regarding their host pages for example list of product or services.…”
Section: Literature Reviewmentioning
confidence: 99%
See 2 more Smart Citations
“…They focus on two things those are, Data records recognition from the query page and next is arrange these extracted data in a table. Robert Grossman, YanhongZhai, Bing Liu [4], mainly focused on the data record which contains the large amount of information on the web. Data records also contain the information regarding their host pages for example list of product or services.…”
Section: Literature Reviewmentioning
confidence: 99%
“…In Record Extraction phase, firstly it identifies the data region which contains the number of query result records and then it does the segmentation of records [4]. Record alignment steps properly align the extracted data in a structured manner means it arrange the all the extracted QRR's in a table.…”
Section: System Overviewmentioning
confidence: 99%
See 1 more Smart Citation
“…At present, many issues in the field of deep Web data integration, such as interface integration [2] [3] and Web data extraction [4,5], have been widely studied. However, as a necessary step, identifying the duplicate entities(records) from multiple Web databases has not received due attention yet.…”
Section: Introductionmentioning
confidence: 99%
“…A similar method is proposed in [11]. [7] and [15] propose some algorithms to identify data records, which do not extract data items from the data records, and do not handle nested data records. Our previous system DEPTA [13] is able to align and extract data items from data records, but does not handle nested data records.…”
Section: Introductionmentioning
confidence: 99%