Frequent Pattern Mining 2014
DOI: 10.1007/978-3-319-07821-2_8
|View full text |Cite
|
Sign up to set email alerts
|

Mining and Using Sets of Patterns through Compression

Abstract: In this chapter we describe how to successfully apply the MDL principle to pattern mining. In particular, we discuss how pattern-based models can be designed and induced by means of compression, resulting in succinct and characteristic descriptions of the data.As motivation, we argue that traditional pattern mining asks the wrong question: instead of asking for all patterns satisfying some interestingness measure, one should ask for a small, non-redundant, and interesting set of patterns-which allows us to avo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 67 publications
(103 reference statements)
0
9
0
Order By: Relevance
“…Manual filtering of the results or tuning algorithm parameters are hardly effective solutions and certainly hard for domain experts. Top-k mining and particularly pattern set mining 28,33 are techniques that specifically address the redundancy problem; in both cases, the number of discovered patterns is limited.…”
Section: Introductionmentioning
confidence: 99%
“…Manual filtering of the results or tuning algorithm parameters are hardly effective solutions and certainly hard for domain experts. Top-k mining and particularly pattern set mining 28,33 are techniques that specifically address the redundancy problem; in both cases, the number of discovered patterns is limited.…”
Section: Introductionmentioning
confidence: 99%
“…We formalize this problem using the minimum description length (MDL) principle [4], which, informally, states that the best model is the one that compresses the data best. The MDL principle perfectly fits our purposes because (1) it allows to select the simplest model that adequately explains the data, and (2) it has been previously shown to be very effective for the selection of pattern-based models (e.g., [7,11]).…”
Section: Introductionmentioning
confidence: 77%
“…Let us now briefly introduce the basic notions of the minimum description length (MDL) principle [4] as it is commonly used in compression-based pattern mining [7]. Given a set of models M and a dataset D, the best model M ∈ M is the one that minimizes L(D, M) = L(M ) + L(D|M ), with L(M ) the length, in bits, of the encoding of M , and L(D|M ) the length, in bits, of the encoding of the data given M .…”
Section: Minimum Description Length (Mdl)mentioning
confidence: 99%
See 1 more Smart Citation
“…In contrast to existing pattern-based modeling approaches (e.g., [42,41]), we deal with a supervised setting in which the goal is to learn a mapping from instances to class labels. This implies that we are not looking for structure within instance data X, but for structure in X that helps to predict Y .…”
Section: For Multiclass Classificationmentioning
confidence: 99%