The minimum description length principle in coding and modeling

Barron, Andrew R.; Rissanen, Jorma; Yu, Bin

doi:10.1109/18.720554

Cited by 831 publications

(680 citation statements)

References 41 publications

Supporting

Mentioning

670

Contrasting

Unclassified

Order By: Relevance

“…Grünwald (1998, Chapter 5) first noted that in this form, by using Stirling's approximation, (7) is essentially equivalent to MAP classification based on the models p c,θ as defined in Section 2. Of course, there exist more refined versions of MDL based on one-part rather than two-part codes (Barron, Rissanen, & Yu, 1998). To apply these to classification, one somehow has to map classifiers to probability distributions explicitly.…”

Section: Why Is the Two-part Code (7) The Appropriate Formula To Workmentioning

confidence: 99%

“…Two frequently used learning methods that in many cases 'automatically' protect against overfitting are Bayesian inference (Bernardo & Smith, 1994) and the Minimum Description Length (MDL) Principle (Rissanen, 1989;Barron, Rissanen, & Yu, 1998;Grünwald, 2005Grünwald, , 2007. We show that, when applied to classification problems, some of the standard variations of these two methods can be inconsistent in the sense that they asymptotically overfit: there exist scenarios where, no matter how much data is available, the generalization error of a classifier based on MDL or the full Bayesian posterior does not converge to the minimum achievable generalization error within the set of classifiers under consideration.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Suboptimal behavior of Bayes and MDL in classification under misspecification

Grünwald

Langford

2007

Mach Learn

View full text Add to dashboard Cite

We show that forms of Bayesian and MDL inference that are often applied to classification problems can be inconsistent. This means that there exists a learning problem such that for all amounts of data the generalization errors of the MDL classifier and the Bayes classifier relative to the Bayesian posterior both remain bounded away from the smallest achievable generalization error. From a Bayesian point of view, the result can be reinterpreted as saying that Bayesian inference can be inconsistent under misspecification, even for countably infinite models. We extensively discuss the result from both a Bayesian and an MDL perspective.

show abstract

Section: Why Is the Two-part Code (7) The Appropriate Formula To Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Suboptimal behavior of Bayes and MDL in classification under misspecification

Grünwald

Langford

2007

Mach Learn

View full text Add to dashboard Cite

show abstract

“…5 Let x be an arbitrary bit string. The shortest program that produces x on U is x * = argmin M∈M (U (M) = x) and the Kolmogorov complexity of x is C(x) = |x * |.…”

Section: Kolmogorov Complexitymentioning

confidence: 99%

“…This principle, often referred to as Occam's razor (to cut off Plato's beard of ideas), has had a decisive influence in the history of science. In modern methodology of science this notion is studied under various guises: Occam's razor [14], the minimal description length (MDL) principle [5,17], two-part-code optimization [29], learning as data compression [30] etc. All these approaches are indebted to the formulation of an algorithmic solution to the problem of induction by Solomonoff [28], Chaitin [6] and Kolmogorov [20], which is one of the greater achievements of science in the 20th century.…”

mentioning

confidence: 99%

Between Order and Chaos: The Quest for Meaningful Information

Adriaans

2009

Theory Comput Syst

View full text Add to dashboard Cite

The notion of meaningful information seems to be associated with the sweet spot between order and chaos. This form of meaningfulness of information, which is primarily what science is interested in, is not captured by both Shannon information and Kolmogorov complexity. In this paper I develop a theoretical framework that can be seen as a first approximation to a study of meaningful information. In this context I introduce the notion of facticity of a data set. I discuss the relation between thermodynamics and algorithmic complexity theory in the context of this problem. I prove that, under adequate measurement conditions, the free energy of a system in the world is associated with the randomness deficiency of a data set with observations about this system. These insights suggest an explanation of the efficiency of human intelligence in terms of helpful distributions. Finally I give a critical discussion of Schmidhuber's views specifically his notion of low complexity art, I defend the view that artists optimize facticity instead.

show abstract

“…We comment on these more elaborate descriptions in Section V. Finally, recall that we will focus only on the case when 2 = 1 . As noted earlier, 2 can be estimated from the HH 1 subband, at which point w e can work with standardized data.…”

Section: A the Laplacian Population Modelmentioning

confidence: 99%

Wavelet thresholding via MDL for natural images

Hansen¹,

2000

IEEE Trans. Inform. Theory

View full text Add to dashboard Cite

We study the application of Rissanen's Principle of Minimum Description Length MDL to the problem of wavelet denoising and compression for natural images. After making a connection between thresholding and model selection, we derive an MDL criterion based on a Laplacian model for noiseless wavelet coe cients. We nd that this approach leads to an adaptive thresholding rule. While achieving mean squared error performance comparable with other popular thresholding schemes, the MDL procedure tends to keep far fewer coe cients. From this property, w e demonstrate that our method is an excellent tool for simultaneous denoising and compression. We make this claim precise by analyzing MDL thresholding in two optimality frameworks; one in which w e measure rate and distortion based on quantized coe cients and one in which we do not quantize, but instead record rate simply as the number of non-zero coe cients.

show abstract

The minimum description length principle in coding and modeling

Cited by 831 publications

References 41 publications

Suboptimal behavior of Bayes and MDL in classification under misspecification

Suboptimal behavior of Bayes and MDL in classification under misspecification

Between Order and Chaos: The Quest for Meaningful Information

Wavelet thresholding via MDL for natural images

Contact Info

Product

Resources

About