Efficient discovery of error-tolerant frequent itemsets in high dimensions

Yang, Cheng; Fayyad, Usama M.; Bradley, Paul S.

doi:10.1145/502512.502539

Cited by 113 publications

(88 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Yang et al [7] proposed two error-tolerant models, termed weak error-tolerant itemsets and strong errortolerant itemsets. Jouni K. et al [8] proposed to mine the dense itemsets in the presence of noise where the dense itemsets are the itemsets with a sufficiently large sub-matrix that exceeds a given density threshold of attributes present.…”

Section: Approximate Frequent Itemsetsmentioning

confidence: 99%

Mining Noise-Tolerant Frequent Closed Itemsets in Very Large Database

Chen

Zhang

Wang

et al. 2009

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYFrequent Itemsets(FI) mining is a popular and important first step in analyzing datasets across a broad range of applications. There are two main problems with the traditional approach for finding frequent itemsets. Firstly, it may often derive an undesirably huge set of frequent itemsets and association rules. Secondly, it is vulnerable to noise. There are two approaches which have been proposed to address these problems individually. The first problem is addressed by the approach Frequent Closed Itemsets (FCI), FCI removes all the redundant information from the result and makes sure there is no information loss. The second problem is addressed by the approach Approximate Frequent Itemsets(AFI), AFI could identify and fix the noises in the datasets. Each of these two concepts has its own limitations, however, the authors find that if FCI and AFI are put together, they could help each other to overcome the limitations and amplify the advantages.

show abstract

Section: Approximate Frequent Itemsetsmentioning

confidence: 99%

Mining Noise-Tolerant Frequent Closed Itemsets in Very Large Database

Chen

Zhang

Wang

et al. 2009

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

show abstract

“…Our approach combines two recent advances in faulttolerant itemset mining and feature construction. The goal of fault-tolerant itemset mining [6] is to support the discovery of relevant frequent itemsets in noisy binary data (see, e.g., [7] for a recent survey). Among others, an extension to (frequent) closed set mining towards fault-tolerance has been studied in [8] that enables a bounded number (δ) of errors per item/attribute.…”

Section: Introductionmentioning

confidence: 99%

Application-Independent Feature Construction from Noisy Samples

Gay¹,

Selmaoui-Folcher²,

Boulicaut³

2009

Advances in Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

Abstract. When training classifiers, presence of noise can severely harm their performance. In this paper, we focus on "non-class" attribute noise and we consider how a frequent fault-tolerant (FFT) pattern mining task can be used to support noise-tolerant classification. Our method is based on an application independent strategy for feature construction based on the so-called δ-free patterns. Our experiments on noisy training data shows accuracy improvement when using the computed features instead of the original ones.

show abstract

“…Motivated by such considerations, various methods [11,7,8,6,5,2] have been proposed recently to discover approximate frequent itemsets (often called error-tolerant itemsets (ETIs)) by allowing itemsets in which a specified fraction of the items can be missing. Please see figure 1 for a conceptual overview.…”

Section: Introductionmentioning

confidence: 99%

“…The most basic approach is to require only that a specified fraction of the items in a collection of items and transactions be present. However, such a 'weak' ETI [11] provides no guarantees on the distribution of the items within this 'block,' i.e., some rows or columns could be completely empty. To address this issue, a 'strong' ETI was defined [11], which required that each row must have at most a specified fraction of items missing.…”

Section: Introductionmentioning

confidence: 99%

“…However, such a 'weak' ETI [11] provides no guarantees on the distribution of the items within this 'block,' i.e., some rows or columns could be completely empty. To address this issue, a 'strong' ETI was defined [11], which required that each row must have at most a specified fraction of items missing. The support of strong ETIs is simply the number of transactions that support the pattern, as in the traditional case, but support does not have the anti-monotone property, i.e., support can increase as the number of items increases.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Quantitative evaluation of approximate frequent pattern mining algorithms

Gupta

Fang

Field

et al. 2008

Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

Traditional association mining algorithms use a strict definition of support that requires every item in a frequent itemset to occur in each supporting transaction. In real-life datasets, this limits the recovery of frequent itemset patterns as they are fragmented due to random noise and other errors in the data. Hence, a number of methods have been proposed recently to discover approximate frequent itemsets in the presence of noise. These algorithms use a relaxed definition of support and additional parameters, such as row and column error thresholds to allow some degree of "error" in the discovered patterns. Though these algorithms have been shown to be successful in finding the approximate frequent itemsets, a systematic and quantitative approach to evaluate them has been lacking. In this paper, we propose a comprehensive evaluation framework to compare different approximate frequent pattern mining algorithms. The key idea is to select the optimal parameters for each algorithm on a given dataset and use the itemsets generated with these optimal parameters in order to compare different algorithms. We also propose simple variations of some of the existing algorithms by introducing an additional post-processing step. Subsequently, we have applied our proposed evaluation framework to a wide variety of synthetic datasets with varying amounts of noise and a real dataset to compare existing and our proposed variations of the approximate pattern mining algorithms. Source code and the datasets used in this study are made publicly available.

show abstract

Efficient discovery of error-tolerant frequent itemsets in high dimensions

Cited by 113 publications

References 20 publications

Mining Noise-Tolerant Frequent Closed Itemsets in Very Large Database

Mining Noise-Tolerant Frequent Closed Itemsets in Very Large Database

Application-Independent Feature Construction from Noisy Samples

Quantitative evaluation of approximate frequent pattern mining algorithms

Contact Info

Product

Resources

About