2022
DOI: 10.48550/arxiv.2202.11678
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Bayesian Model Selection, the Marginal Likelihood, and Generalization

Abstract: How do we compare between hypotheses that are entirely consistent with observations? The marginal likelihood (aka Bayesian evidence), which represents the probability of generating our observations from a prior, provides a distinctive approach to this foundational question, automatically encoding Occam's razor. Although it has been observed that the marginal likelihood can overfit and is sensitive to prior assumptions, its limitations for hyperparameter learning and discrete model comparison have not been thor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 27 publications
0
4
0
Order By: Relevance
“…In principle, BFs could be computed after the hierarchical population inference [54] or between different population models [3], but we here show that they are unreliable without this step. Even then, it is not possible to evade the core problem of prior dependence when computing BFs, no matter how many levels of inference are applied: the BF computation based on the highest level of inference in a hierarchical model will still depend on the choice of priors on that level, reducing the problem once again to the choice of a prior distribution that is difficult to establish in a principled way.…”
Section: Discussionmentioning
confidence: 81%
“…In principle, BFs could be computed after the hierarchical population inference [54] or between different population models [3], but we here show that they are unreliable without this step. Even then, it is not possible to evade the core problem of prior dependence when computing BFs, no matter how many levels of inference are applied: the BF computation based on the highest level of inference in a hierarchical model will still depend on the choice of priors on that level, reducing the problem once again to the choice of a prior distribution that is difficult to establish in a principled way.…”
Section: Discussionmentioning
confidence: 81%
“…Despite its intuitive appeal, the marginal likelihood (and thus BFs and PMPs) represents a well-known and widely appreciated source of intractability in Bayesian workflows, since it typically involves a multidimensional integral (Equation 7) over potentially unbounded parameter spaces (Gronau, Sarafoglou, et al, 2017;Lotfi et al, 2022). Furthermore, the marginal likelihood becomes doubly intractable when the likelihood function is itself not available (e.g., in simulation-based settings), thereby making the comparison of such models a challenging and sometimes, up to this point, hopeless endeavor.…”
Section: Bayesian Model Comparisonmentioning
confidence: 99%
“…We consider Bayesian model comparison (BMC) as a principled framework for comparing and ranking competing HMs via Occam's razor (Kass & Raftery, 1995;Lotfi et al, 2022;MacKay, 2003). However, standard BMC is analytically intractable for nontrivial HMs, as it requires marginalization over high-dimensional parameter spaces.…”
mentioning
confidence: 99%
“…That is, the MPA "biases" the predictive distribution p(y | x * , D) since it generally assumes that f * has lower uncertainty than it actually has. And since p(f * | x * , D) is usually induced by the LA or VB which are often underconfident on large networks [7,33,34], the bias of MPA towards overconfidence thus counterbalances the underconfidence of p(f * | x * , D). Therefore, in this case, the MPA can yield better-calibrated predictive distributions than MC integration [32,6].…”
Section: Analytic Alternatives To MC Integration Are Not the Definiti...mentioning
confidence: 99%