1998
DOI: 10.1109/18.720554
|View full text |Cite
|
Sign up to set email alerts
|

The minimum description length principle in coding and modeling

Abstract: Abstract-We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon's basic source coding theorem. The normalized maximized likelihood, mixture, and predictive codings are each shown to achieve the stochastic complexity to within asymptotically vanishing terms. We assess the performance of the minimum description length criterion … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

5
670
0
5

Year Published

2000
2000
2018
2018

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 831 publications
(680 citation statements)
references
References 41 publications
5
670
0
5
Order By: Relevance
“…Grünwald (1998, Chapter 5) first noted that in this form, by using Stirling's approximation, (7) is essentially equivalent to MAP classification based on the models p c,θ as defined in Section 2. Of course, there exist more refined versions of MDL based on one-part rather than two-part codes (Barron, Rissanen, & Yu, 1998). To apply these to classification, one somehow has to map classifiers to probability distributions explicitly.…”
Section: Why Is the Two-part Code (7) The Appropriate Formula To Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Grünwald (1998, Chapter 5) first noted that in this form, by using Stirling's approximation, (7) is essentially equivalent to MAP classification based on the models p c,θ as defined in Section 2. Of course, there exist more refined versions of MDL based on one-part rather than two-part codes (Barron, Rissanen, & Yu, 1998). To apply these to classification, one somehow has to map classifiers to probability distributions explicitly.…”
Section: Why Is the Two-part Code (7) The Appropriate Formula To Workmentioning
confidence: 99%
“…Two frequently used learning methods that in many cases 'automatically' protect against overfitting are Bayesian inference (Bernardo & Smith, 1994) and the Minimum Description Length (MDL) Principle (Rissanen, 1989;Barron, Rissanen, & Yu, 1998;Grünwald, 2005Grünwald, , 2007. We show that, when applied to classification problems, some of the standard variations of these two methods can be inconsistent in the sense that they asymptotically overfit: there exist scenarios where, no matter how much data is available, the generalization error of a classifier based on MDL or the full Bayesian posterior does not converge to the minimum achievable generalization error within the set of classifiers under consideration.…”
Section: Introductionmentioning
confidence: 99%
“…5 Let x be an arbitrary bit string. The shortest program that produces x on U is x * = argmin M∈M (U (M) = x) and the Kolmogorov complexity of x is C(x) = |x * |.…”
Section: Kolmogorov Complexitymentioning
confidence: 99%
“…This principle, often referred to as Occam's razor (to cut off Plato's beard of ideas), has had a decisive influence in the history of science. In modern methodology of science this notion is studied under various guises: Occam's razor [14], the minimal description length (MDL) principle [5,17], two-part-code optimization [29], learning as data compression [30] etc. All these approaches are indebted to the formulation of an algorithmic solution to the problem of induction by Solomonoff [28], Chaitin [6] and Kolmogorov [20], which is one of the greater achievements of science in the 20th century.…”
mentioning
confidence: 99%
“…We comment on these more elaborate descriptions in Section V. Finally, recall that we will focus only on the case when 2 = 1 . As noted earlier, 2 can be estimated from the HH 1 subband, at which point w e can work with standardized data.…”
Section: A the Laplacian Population Modelmentioning
confidence: 99%