2010 IEEE International Conference on Data Mining 2010
DOI: 10.1109/icdm.2010.17
|View full text |Cite
|
Sign up to set email alerts
|

Stratified Sampling for Data Mining on the Deep Web

Abstract: In recent years, one mode of data dissemination has become extremely popular, which is the deep web. Like any other data source, data mining on the deep web can produce important insights or summary of results. However, data mining on the deep web is challenging because the databases cannot be accessed directly, and therefore, data mining must be performed based on sampling of the datasets. The samples, in turn, can only be obtained by querying the deep web databases with specific inputs.In this paper, we targ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2011
2011
2021
2021

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 19 publications
(12 citation statements)
references
References 18 publications
0
12
0
Order By: Relevance
“…[7][8][9] describe efficient techniques to obtain random samples from hidden web databases that can then be utilized to perform aggregate estimation. Recent works such as [16,24] propose more sophisticated sampling techniques that reduce variance of aggregate estimation. Skyline Computation: Skyline operator was first described in [4] and number of subsequent work have studied it from diverse contexts.…”
Section: Related Workmentioning
confidence: 99%
“…[7][8][9] describe efficient techniques to obtain random samples from hidden web databases that can then be utilized to perform aggregate estimation. Recent works such as [16,24] propose more sophisticated sampling techniques that reduce variance of aggregate estimation. Skyline Computation: Skyline operator was first described in [4] and number of subsequent work have studied it from diverse contexts.…”
Section: Related Workmentioning
confidence: 99%
“…However, these researches focus on how to build an interactive query system or a vertical search system using data integration technologies [16–18]. Recently, with the development of sampling and crawling over the deep web [1921], mining deep web has attracted more attention than before [2224]. Moreover, outlier detection research has always been a hot topic in machine learning and data mining.…”
Section: Related Workmentioning
confidence: 99%
“…[1] proposed an adaptive sampling algorithm for answering aggregation queries over websites with hierarchical structure. Recent works such as [25,31] propose more sophisticated sampling techniques so as to reduce the variance of the aggregate estimation. For hidden databases with keyword interfaces, prior work have studied estimating the size of search engines [5,4,32], corpus [6] or document collection [29].…”
Section: Related Work Information Integration and Extraction For Hiddmentioning
confidence: 99%