Control variates for stochastic gradient MCMC

Baker, Jack W.; Fearnhead, Paul; Fox, Emily B.; Nemeth, Christopher

doi:10.1007/s11222-018-9826-2

Cited by 55 publications

(95 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Increasing the minibatch size will reduce the variance of the gradient estimate, but increase the per iteration computational cost of the SGMCMC algorithm. Recently control variates (Ripley, 2009) have been used to reduce the variance in the gradient estimate of SGMCMC (Dubey et al, 2016;Nagapetyan et al, 2017;Baker et al, 2017). Using these improved gradient estimates have been shown to lead to improvements in the mean squared error (MSE) of the algorithm (Dubey et al, 2016), as well as its computational cost (Nagapetyan et al, 2017;Baker et al, 2017).…”

Section: Stochastic Gradient Mcmc With Control Variatesmentioning

confidence: 99%

See 1 more Smart Citation

sgmcmc: An R Package for Stochastic Gradient Markov Chain Monte Carlo

Baker¹,

Fearnhead²,

Fox³

et al. 2019

J. Stat. Soft.

Self Cite

View full text Add to dashboard Cite

This paper introduces the R package sgmcmc; which can be used for Bayesian inference on problems with large datasets using stochastic gradient Markov chain Monte Carlo (SGMCMC). Traditional Markov chain Monte Carlo (MCMC) methods, such as Metropolis-Hastings, are known to run prohibitively slowly as the dataset size increases. SGMCMC solves this issue by only using a subset of data at each iteration. SGMCMC requires calculating gradients of the log likelihood and log priors, which can be time consuming and error prone to perform by hand. The sgmcmc package calculates these gradients itself using automatic differentiation, making the implementation of these methods much easier. To do this, the package uses the software library TensorFlow, which has a variety of statistical distributions and mathematical operations as standard, meaning a wide class of models can be built using this framework. SGMCMC has become widely adopted in the machine learning literature, but less so in the statistics community. We believe this may be partly due to lack of software; this package aims to bridge this gap.of the major advantages of sgmcmc is that gradients are calculated within the package using automatic differentiation (Griewank and Walther, 2008). This means that users need only specify the log likelihood function and log prior for their model. The package calculates the gradients using TensorFlow (TensorFlow Development Team, 2015), which has recently been made available for R (Allaire et al., 2016). TensorFlow is an efficient library for numerical computation which can take advantage of a wide variety of architectures, as such, sgmcmc keeps much of this efficiency. Both sgmcmc and TensorFlow are available on CRAN, so sgmcmc can be installed by using the standard install.packages function. Though after the TensorFlow package has been installed, the extra install tensorflow() function needs to be run, which installs the required Python implementation of TensorFlow. 1 The sgmcmc package also has a website with vignettes, tutorials and an API reference. 2 SGMCMC methods have become popular in the machine learning literature but less so in the statistics community. We partly attribute this to the lack of available software. To the best of our knowledge, there are currently no R packages available for SGMCMC, probably the most popular programming language within the statistics community. The only package we are aware of which implements scalable MCMC is the Python package edward (Tran et al., 2016). This package implements both SGLD and SGHMC, but does not implement SGNHT or any of the control variate methods.Section 2 introduces MCMC and discusses the software currently available for implementing MCMC algorithms, we discuss the scenarios where sgmcmc is designed to be used. In Section 3 we review the methodology behind the SGMCMC methods implemented in sgmcmc. Section 4 provides a brief introduction to TensorFlow. Section 5 overviews the structure of the package, as well as details of how the algorithms are implemented. Section 6...

show abstract

Section: Stochastic Gradient Mcmc With Control Variatesmentioning

confidence: 99%

“…We implement the formulation of Baker et al (2017), who replace the gradient estimate ∇ θ log p(θ|x) with…”

Section: Stochastic Gradient Mcmc With Control Variatesmentioning

confidence: 99%

sgmcmc: An R Package for Stochastic Gradient Markov Chain Monte Carlo

Baker¹,

Fearnhead²,

Fox³

et al. 2019

J. Stat. Soft.

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, although they often work well in practice it can be difficult to know just how accurate the results are for any given application. Furthermore, many of these algorithms still have a computational cost that increases linearly with data size (Bardenet et al ., 2017; Nagapetyan et al ., 2017; Baker et al ., 2019).…”

Section: Introductionmentioning

confidence: 99%

Quasi-Stationary Monte Carlo and The Scale Algorithm

Pollock

Fearnhead

Johansen

et al. 2020

Journal of the Royal Statistical Society Series B: Statistical Methodology

Self Cite

View full text Add to dashboard Cite

Summary This paper introduces a class of Monte Carlo algorithms which are based on the simulation of a Markov process whose quasi‐stationary distribution coincides with a distribution of interest. This differs fundamentally from, say, current Markov chain Monte Carlo methods which simulate a Markov chain whose stationary distribution is the target. We show how to approximate distributions of interest by carefully combining sequential Monte Carlo methods with methodology for the exact simulation of diffusions. The methodology introduced here is particularly promising in that it is applicable to the same class of problems as gradient‐based Markov chain Monte Carlo algorithms but entirely circumvents the need to conduct Metropolis–Hastings type accept–reject steps while retaining exactness: the paper gives theoretical guarantees ensuring that the algorithm has the correct limiting target distribution. Furthermore, this methodology is highly amenable to ‘big data’ problems. By employing a modification to existing naive subsampling and control variate techniques it is possible to obtain an algorithm which is still exact but has sublinear iterative cost as a function of data size.

show abstract

“…Both approaches show good performance when the subset posteriors are near Gaussian, which is expected for adequately large sample sizes for each subset, based on the Bayesian central limit theorem (Bernstein von-Mises theorem; see Van der Vaart [41], and Le Cam and Yang [19]). However, for non-Gaussian posteriors, the methods may have unreliable performance (Baker et al [3]; Neiswanger et al [28]; Miroshnikov et al [25]). The method of Neiswanger et al [28] also has limitations as the number of unknown model parameters increases, since kernel density estimation becomes infeasible in larger dimensions (Wang and Dunson [42]; Scott [37]).…”

Section: Introductionmentioning

confidence: 99%

Parallel Markov chain Monte Carlo for Bayesian hierarchical models with big data, in two stages

Wei

Conlon

2019

Journal of Applied Statistics

View full text Add to dashboard Cite

Due to the escalating growth of big data sets in recent years, new Bayesian Markov chain Monte Carlo (MCMC) parallel computing methods have been developed. These methods partition large data sets by observations into subsets. However, for Bayesian nested hierarchical models, typically only a few parameters are common for the full data set, with most parameters being groupspecific. Thus, parallel Bayesian MCMC methods that take into account the structure of the model and split the full data set by groups rather than by observations are a more natural approach for analysis. Here, we adapt and extend a recently introduced two-stage Bayesian hierarchical modeling approach, and we partition complete data sets by groups. In stage 1, the group-specific parameters are estimated independently in parallel. The stage 1 posteriors are used as proposal distributions in stage 2, where the target distribution is the full model. Using three-level and four-level models, we show in both simulation and real data studies that results of our method agree closely with the full data analysis, with greatly increased MCMC efficiency and greatly reduced computation times. The advantages of our method versus existing parallel MCMC computing methods are also described.

show abstract

Control variates for stochastic gradient MCMC

Cited by 55 publications

References 18 publications

sgmcmc: An R Package for Stochastic Gradient Markov Chain Monte Carlo

sgmcmc: An R Package for Stochastic Gradient Markov Chain Monte Carlo

Quasi-Stationary Monte Carlo and The Scale Algorithm

Parallel Markov chain Monte Carlo for Bayesian hierarchical models with big data, in two stages

Contact Info

Product

Resources

About