2009
DOI: 10.1007/s10618-009-0137-2
|View full text |Cite
|
Sign up to set email alerts
|

Identifying the components

Abstract: Most, if not all, databases are mixtures of samples from different distributions. Transactional data is no exception. For the prototypical example, supermarket basket analysis, one also expects a mixture of different buying patterns. Households of retired people buy different collections of items than households with young children. Models that take such underlying distributions into account are in general superior to those that do not. In this paper we introduce two MDL-based algorithms that follow orthogonal… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2009
2009
2023
2023

Publication Types

Select...
4
4

Relationship

2
6

Authors

Journals

citations
Cited by 17 publications
(6 citation statements)
references
References 22 publications
0
6
0
Order By: Relevance
“…Using the minimum description length (MDL) principle, 25 the Krimp algorithm mines patterns that summarize the data in a well-compressed representation. 26 This approach reduces the redundancy and generates patterns that are useful for classification, 26 component identification, 27 and change detection. 28 The Krimp algorithm, inspired by the gap handling of the SQS 29 algorithm, is developed further into the GoKrimp algorithm to both penalize gaps and handle interleaving patterns.…”
Section: T E E T E E T E E T E E T Ementioning
confidence: 99%
“…Using the minimum description length (MDL) principle, 25 the Krimp algorithm mines patterns that summarize the data in a well-compressed representation. 26 This approach reduces the redundancy and generates patterns that are useful for classification, 26 component identification, 27 and change detection. 28 The Krimp algorithm, inspired by the gap handling of the SQS 29 algorithm, is developed further into the GoKrimp algorithm to both penalize gaps and handle interleaving patterns.…”
Section: T E E T E E T E E T E E T Ementioning
confidence: 99%
“…Faloutsos and Megalooikonomou [15] argue that Kolomogorov Complexity and Minimum Description Length [52,20] provide a powerful and well-founded approach to data mining. There exist many examples where MDL has been successfully employed in data mining, including, for example, for classification [50,38], clustering [31,6,39], discretization [16,30], defining parameter-free distance measures [28,29,11,66], feature selection [48], imputation [65], mining temporally surprising patterns [9], detecting change points in data streams [37], model order selection in matrix factorization [46], outlier detection [58,3], summarizing categorical data [43], transfer learning [54], discovering communities in matrices [8,47,63] and evolving graphs [60], finding sources of infection in large graphs [49], and for making sense of selected nodes in graphs [4].…”
Section: In Data Miningmentioning
confidence: 99%
“…An effective and efficient heuristic is to take an EM-like approach [14], starting with a random partitioning, and iteratively inducing models and re-assigning tuples to maximize compression, until convergence. Besides automatically determining the optimal number of components, this approach has been shown to find sound groupings [39].…”
Section: 31mentioning
confidence: 99%
“…The resulting code tables have been shown to be of very high quality, while reducing the number of patterns up to 7 orders of magnitude. Subsequent work showed that natural clusterings of binary data can be discovered by partitioning the data such that the combined cost of the code tables per part are minimized (van Leeuwen et al 2009). Extensions of KRIMP to sequences and multitable settings have been proposed by Tatti and Vreeken (2012) and Siebes (2008, 2009), respectively.…”
Section: Pattern Set Miningmentioning
confidence: 99%