2004
DOI: 10.1109/tnn.2004.828762
|View full text |Cite
|
Sign up to set email alerts
|

Variational Learning and Bits-Back Coding: An Information-Theoretic View to Bayesian Learning

Abstract: The bits-back coding first introduced by Wallace in 1990 and later by Hinton and van Camp in 1993 provides an interesting link between Bayesian learning and information-theoretic minimum-description-length (MDL) learning approaches. The bits-back coding allows interpreting the cost function used in the variational Bayesian method called ensemble learning as a code length in addition to the Bayesian view of misfit of the posterior approximation and a lower bound of model evidence. Combining these two viewpoints… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
45
0

Year Published

2007
2007
2024
2024

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 55 publications
(46 citation statements)
references
References 36 publications
1
45
0
Order By: Relevance
“…For comparison we use the RBC via type-2 Student mixture model developed by Archambeau et al [5], in which the full Bayesian treatment allows for model selection, via the log evidence bound. As somewhat expected on the basis of known relationships between Bayesian and information-theoretic model selection [23], we see the two methods behave similarly indeed, and in most cases they both pick the true number of clusters. In the error-free case we can observe that the inaccuracy incurred by the speedup is negligible, while as the error level increases the advantage of our model's ability of taking into account these errors becomes apparent despite the speedup.…”
Section: E Assessment Of the Proposed Procedures For Determining The supporting
confidence: 75%
See 1 more Smart Citation
“…For comparison we use the RBC via type-2 Student mixture model developed by Archambeau et al [5], in which the full Bayesian treatment allows for model selection, via the log evidence bound. As somewhat expected on the basis of known relationships between Bayesian and information-theoretic model selection [23], we see the two methods behave similarly indeed, and in most cases they both pick the true number of clusters. In the error-free case we can observe that the inaccuracy incurred by the speedup is negligible, while as the error level increases the advantage of our model's ability of taking into account these errors becomes apparent despite the speedup.…”
Section: E Assessment Of the Proposed Procedures For Determining The supporting
confidence: 75%
“…The optimal number can be automatically determined either by a Bayesian approach, such as in [5], [8], [43], [46] or based on information theory, such as minimum message length (MML) [50], minimum description length [39], Bayesian information criterion [40], Akaike information criterion [1], or by cross validation [19]. Among these methods, the Bayesian approach is currently most popular, and there are well known connections between the Bayesian approach and information theoretic ones [23].…”
Section: G Determining the Number Of Mixture Componentsmentioning
confidence: 99%
“…This provides an alternative justification for the variational method. Additionally, the alternative interpretation can provide more intuitive explanations on why some models provide higher mar- ginal likelihoods than others [22]. For the remainder of this paper, the optimization criterion will be the cost function (6) that is to be minimized.…”
Section: B Variational Bayesian Learningmentioning
confidence: 99%
“…6.1, and a measure of the amount of independent innovation in the hidden nodes, the latter of which can be influenced by introducing the evidence nodes. More detailed discussion is presented in [45]. In addition to restricting the innovations, the incoming weights A of the hidden nodes are initialised to random values by evidence nodes with variance σ 2 = 10 −2 and life time of 40 iterations, when new nodes are added.…”
Section: Addition Of Hidden Nodesmentioning
confidence: 99%