2010
DOI: 10.1007/s10618-010-0188-4
|View full text |Cite
|
Sign up to set email alerts
|

Using background knowledge to rank itemsets

Abstract: Assessing the quality of discovered results is an important open problem in data mining. Such assessment is particularly vital when mining itemsets, since commonly many of the discovered patterns can be easily explained by background knowledge. The simplest approach to screen uninteresting patterns is to compare the observed frequency against the independence model. Since the parameters for the independence model are the column margins, we can view such screening as a way of using the column margins as backgro… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0

Year Published

2011
2011
2017
2017

Publication Types

Select...
6
2
2

Relationship

2
8

Authors

Journals

citations
Cited by 24 publications
(18 citation statements)
references
References 21 publications
0
18
0
Order By: Relevance
“…Approaches that take account of background knowledge provide an important and closely related field of research [Jaroszewicz et al 2009;Tatti and Mampaey 2010;De Bie 2011].…”
Section: Other Related Approachesmentioning
confidence: 99%
“…Approaches that take account of background knowledge provide an important and closely related field of research [Jaroszewicz et al 2009;Tatti and Mampaey 2010;De Bie 2011].…”
Section: Other Related Approachesmentioning
confidence: 99%
“…Indeed, most of pattern-based classification techniques focus on the sequential behavior and omit to take contextual and external knowledge into account. Though, recently, a new trend in the data mining field tries to incorporate expert knowledge in the process to improve the result quality [21,22,23]. Our approach clearly comes within this scope and we experimentally show that adding as much as available non-sequential information would lead to increase the classification performances.…”
Section: Genericity Of Mspcmentioning
confidence: 75%
“…However, the number of mined frequent itemsets is typically very large, because it contains a lot of redundant or potentially irrelevant patterns. To generate a more compact set of frequent itemsets representing most significant yet non-redundant knowledge hidden in the analyzed data many research efforts have been made (e.g., [28], [29], [20], [30]). Given a minimum support threshold minsup and a maximum itemset model size K, we extract the top-K most interesting and non-redundant itemsets according to the entropy-based heuristics proposed in [20].…”
Section: B Entropy-based Itemset Miningmentioning
confidence: 99%