Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2001
DOI: 10.1145/502512.502539
|View full text |Cite
|
Sign up to set email alerts
|

Efficient discovery of error-tolerant frequent itemsets in high dimensions

Abstract: We present a generalization of frequent itemsets allowing for the notion of errors in the itemset definition. We motivate the problem and present an efficient algorithm that identifies errortolerant frequent clusters of items in transactional data (customerpurchase data, web browsing data, text, etc.). The algorithm exploits sparseness of the underlying data to find large groups of items that are correlated over database records (rows). The notion of transaction coverage allows us to extend the algorithm and v… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
87
0

Year Published

2008
2008
2009
2009

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 113 publications
(88 citation statements)
references
References 20 publications
1
87
0
Order By: Relevance
“…Yang et al [7] proposed two error-tolerant models, termed weak error-tolerant itemsets and strong errortolerant itemsets. Jouni K. et al [8] proposed to mine the dense itemsets in the presence of noise where the dense itemsets are the itemsets with a sufficiently large sub-matrix that exceeds a given density threshold of attributes present.…”
Section: Approximate Frequent Itemsetsmentioning
confidence: 99%
“…Yang et al [7] proposed two error-tolerant models, termed weak error-tolerant itemsets and strong errortolerant itemsets. Jouni K. et al [8] proposed to mine the dense itemsets in the presence of noise where the dense itemsets are the itemsets with a sufficiently large sub-matrix that exceeds a given density threshold of attributes present.…”
Section: Approximate Frequent Itemsetsmentioning
confidence: 99%
“…Our approach combines two recent advances in faulttolerant itemset mining and feature construction. The goal of fault-tolerant itemset mining [6] is to support the discovery of relevant frequent itemsets in noisy binary data (see, e.g., [7] for a recent survey). Among others, an extension to (frequent) closed set mining towards fault-tolerance has been studied in [8] that enables a bounded number (δ) of errors per item/attribute.…”
Section: Introductionmentioning
confidence: 99%
“…Motivated by such considerations, various methods [11,7,8,6,5,2] have been proposed recently to discover approximate frequent itemsets (often called error-tolerant itemsets (ETIs)) by allowing itemsets in which a specified fraction of the items can be missing. Please see figure 1 for a conceptual overview.…”
Section: Introductionmentioning
confidence: 99%
“…The most basic approach is to require only that a specified fraction of the items in a collection of items and transactions be present. However, such a 'weak' ETI [11] provides no guarantees on the distribution of the items within this 'block,' i.e., some rows or columns could be completely empty. To address this issue, a 'strong' ETI was defined [11], which required that each row must have at most a specified fraction of items missing.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation