2019
DOI: 10.48550/arxiv.1902.03932
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning

Abstract: The posteriors over neural network weights are high dimensional and multimodal. Each mode typically characterizes a meaningfully different representation of the data. We develop Cyclical Stochastic Gradient MCMC (SG-MCMC) to automatically explore such distributions. In particular, we propose a cyclical stepsize schedule, where larger steps discover new modes, and smaller steps characterize each mode. We prove that our proposed learning rate schedule provides faster convergence to samples from a stationary dist… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
56
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 39 publications
(59 citation statements)
references
References 25 publications
3
56
0
Order By: Relevance
“…For this case of model misspecification, a cold posterior performs better in terms of predictive capacity (MPL). Cold posteriors have been found to perform well also in other studies [146,148,149]. In Fig.…”
Section: Metricsupporting
confidence: 74%
See 2 more Smart Citations
“…For this case of model misspecification, a cold posterior performs better in terms of predictive capacity (MPL). Cold posteriors have been found to perform well also in other studies [146,148,149]. In Fig.…”
Section: Metricsupporting
confidence: 74%
“…In this regard, a technique that is often used in practice is posterior tempering, i.e., sampling θ values from p(θ|D) 1/τ instead of the true posterior (τ = 1), where τ is called temperature. Specifically, it has been reported in the literature that "cold" posteriors, τ < 1, perform better [146,148,149], although using a more informed prior can potentially remove this effect [143]. Cold posteriors can be interpreted as over-counting the available data using 1/τ replications of it, thus, making the posterior more concentrated.…”
Section: A4 Posterior Tempering For Model Misspecificationmentioning
confidence: 99%
See 1 more Smart Citation
“…In particular, we take the code 7 and networks from Fortuin et al (2021b,a) and mirror their experimental setup as closely as possible. This code combines a cyclical learning rate schedule (Zhang et al, 2019), a gradient-guided Monte Carlo (GG-MC) scheme (Garriga-Alonso & Fortuin, 2021), and the preconditioning and convergence diagnostics from Wenzel et al (2020). Following Fortuin et al (2021b), we ran 60 cycles with 45 epochs in each cycle.…”
Section: Bayesian Neural Network and The Cold Posterior Effectmentioning
confidence: 99%
“…Likewise, the Laplace inference for BBNs has improved in scalability using further GGN approximations [30,31,37,51,52] and sub-network inference [12,37]. Orthogonally, MCMC methods for BNNs have been improved [20,24,61,66], better BNN priors have been studied [19,21], and even deep ensembles [39] have been cast as approximate inference [9-11, 32, 49, 50, 62].…”
Section: Related Workmentioning
confidence: 99%