Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2011
DOI: 10.1145/2020408.2020500
|View full text |Cite
|
Sign up to set email alerts
|

Direct local pattern sampling by efficient two-step random procedures

Abstract: We present several exact and highly scalable local pattern sampling algorithms. They can be used as an alternative to exhaustive local pattern discovery methods (e.g, frequent set mining or optimistic-estimator-based subgroup discovery) and can substantially improve efficiency as well as controllability of pattern discovery processes. While previous sampling approaches mainly rely on theMarkov chainMonte Carlo method, our procedures are direct, i.e., non processsimulating, sampling algorithms. The advantages o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
132
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 59 publications
(132 citation statements)
references
References 31 publications
0
132
0
Order By: Relevance
“…In order to adopt this algorithm for a data stream, we use T ile to recalculate the top-k largest tiles from scratch whenever the window is sliding. -sTile: a sampling based technique proposed in [2]. The sT ile algorithm samples N itemsets from the window such that each itemset is sampled with probability proportional to the area of the itemset.…”
Section: Methodsmentioning
confidence: 99%
“…In order to adopt this algorithm for a data stream, we use T ile to recalculate the top-k largest tiles from scratch whenever the window is sliding. -sTile: a sampling based technique proposed in [2]. The sT ile algorithm samples N itemsets from the window such that each itemset is sampled with probability proportional to the area of the itemset.…”
Section: Methodsmentioning
confidence: 99%
“…In general, we can use the result of standard frequent pattern mining [2,20] although this incurs a high computational cost. Instead, we can resort to pattern sampling techniques [12,4], yet then we have to choose the number of patterns to be sampled. Alternatively, we [22] proposed to mine such pattern sets by the Minimum Description Length principle [11].…”
Section: Related Workmentioning
confidence: 99%
“…Boley et al [2,3] proposed to use Metropolis-Hastings sampling for the construction of data mining systems that do not require any userspecified threshold, i.e., minsup or minconf . However, all the algorithms generate approximate results and the completeness cannot be guaranteed.…”
Section: Related Workmentioning
confidence: 99%