Two new implementations of the EM algorithm are proposed for maximum likelihood ®tting of generalized linear mixed models. Both methods use random (independent and identically distributed) sampling to construct Monte Carlo approximations at the E-step. One approach involves generating random samples from the exact conditional distribution of the random effects (given the data) by rejection sampling, using the marginal distribution as a candidate. The second method uses a multivariate t importance sampling approximation. In many applications the two methods are complementary. Rejection sampling is more ef®cient when sample sizes are small, whereas importance sampling is better with larger sample sizes. Monte Carlo approximation using random samples allows the Monte Carlo error at each iteration to be assessed by using standard central limit theory combined with Taylor series methods. Speci®cally, we construct a sandwich variance estimate for the maximizer at each approximate E-step. This suggests a rule for automatically increasing the Monte Carlo sample size after iterations in which the true EM step is swamped by Monte Carlo error. In contrast, techniques for assessing Monte Carlo error have not been developed for use with alternative implementations of Monte Carlo EM algorithms utilizing Markov chain Monte Carlo E-step approximations. Three different data sets, including the infamous salamander data of McCullagh and Nelder, are used to illustrate the techniques and to compare them with the alternatives. The results show that the methods proposed can be considerably more ef®cient than those based on Markov chain Monte Carlo algorithms. However, the methods proposed may break down when the intractable integrals in the likelihood function are of high dimension.
Often, either from a lack of prior information or simply for convenience, variance components are modeled with improper priors in hierarchical linear mixed models. Although the posterior distributions for these models are rarely available in closed form, the usual conjugate structure of the prior specification allows for painless calculation of the Gibbs conditionals. Thus, the Gibbs sampler may be employed to explore the posterior distribution without ever having established propriety of the posterior. An example is given which shows that the output from a Gibbs chain corresponding to an improper posterior may appear perfectly reasonable. Thus, one can not expect the Gibbs output to provide a "red flag," informing the user that the posterior is improper. The user must demonstrate propriety before a Markov chain Monte Carlo technique is used. A theorem is given which classifies improper priors according to the propriety of the resulting posteriors. Applications concerning Bayesian analysis of animal breeding data and the location of maxima of unwieldy (restricted) likelihood functions are discussed. Gibbs sampling with improper posteriors is then considered in more generality. The concept of functional compatibility of conditional densities is introduced and is used to construct an invariant measure for a class of Markov chains. These results are used to show that Gibbs chains corresponding to improper posteriors are, in theory, quite ill behaved.
Two important questions that must be answered whenever a Markov chain Monte Carlo (MCMC) algorithm is used are (Q1) What is an appropriate burn-in? and (Q2) How long should the sampling continue after burn-in? Developing rigorous answers to these questions presently requires a detailed study of the convergence properties of the underlying Markov chain. Consequently, in most practical applications of MCMC, exact answers to (Q1) and (Q2) are not sought. The goal of this paper is to demystify the analysis that leads to honest answers to (Q1) and (Q2). The authors hope that this article will serve as a bridge between those developing Markov chain theory and practitioners using MCMC to solve practical problems.The ability to address (Q1) and (Q2) formally comes from establishing a drift condition and an associated minorization condition, which together imply that the underlying Markov chain is geometrically ergodic. In this article, we explain exactly what drift and minorization are as well as how and why these conditions can be used to form rigorous answers to (Q1) and (Q2). The basic ideas are as follows. The results of Rosenthal (1995) and Roberts and Tweedie (1999) allow one to use drift and minorization conditions to construct a formula giving an analytic upper bound on the distance to stationarity. A rigorous answer to (Q1) can be calculated using this formula. The desired characteristics of the target distribution are typically estimated using ergodic averages. Geometric ergodicity of the underlying Markov chain implies that there are central limit theorems available for ergodic averages (Chan and Geyer 1994). The regenerative simulation technique (Mykland, Tierney and Yu, 1995;Robert, 1995) can be used to get a consistent estimate of the variance of the asymptotic normal distribution. Hence, an asymptotic standard error can be calculated, which provides an answer to (Q2) in the sense that an appropriate time to stop sampling can be determined. The methods are illustrated using a Gibbs sampler for a Bayesian version of the one-way random effects model and a data set concerning styrene exposure.
The unconditional mean squared error of prediction (UMSEP) is widely used as a measure of prediction variance for inferences concerning linear combinations of fixed and random effects in the classical normal theory mixed model. But the UMSEP is inappropriate for generalized linear mixed models where the conditional variance of the random effects depends on the data. When the random effects describe variation between independent small domains and domain-specific prediction is of interest, we propose a conditional mean squared error of prediction (CMSEP) as a general measure of prediction variance. The CMSEP is shown to be the sum of the conditional variance and a positive correction that accounts for the sampling variability of parameter estimates. We derive a second-order-correct estimate of the CMSEP that consists of three components: (a) a plug-in estimate of the conditional variance, (b) a plug-in estimate of a Taylor series approximation to the correction term, and (c) a bootstrap estimate of the bias incurred in (a). In the normal case our formulas based on the CMSEP provide a conditional alternative to the unconditional expansions of Fuller and Harter, Kackar and Harville, and Prasad and Rao. In addition, we show that the prediction variance formula obtained by Wolfinger and O'Connell and suggested by Breslow and Clayton is in fact Laplace's approximation to the CMSEP based on the assumption that the variance components are known and ignoring the bias-correction term. Thus this formula has a conditional interpretation in the small-domain setting and should not be interpreted unconditionally. Finally, although use of the CMSEP is motivated using entirely frequentist arguments, our second-order approximation to the CMSEP closely resembles a corresponding expansion for the Bayesian posterior variance.
The data augmentation (DA) algorithm is a widely used Markov chain Monte Carlo (MCMC) algorithm that is based on a Markov transition density of the form pwhere f X|Y and f Y |X are conditional densities. The PX-DA and marginal augmentation algorithms of Liu and Wu [J. Amer. Statist. Assoc. 94 (1999) 1264-1274] and Meng and van Dyk [Biometrika 86 (1999) 301-320] are alternatives to DA that often converge much faster and are only slightly more computationally demanding. The transition densities of these alternative algorithms can be written in the form pR(x|xWe prove that when R satisfies certain conditions, the MCMC algorithm driven by pR is at least as good as that driven by p in terms of performance in the central limit theorem and in the operator norm sense. These results are brought to bear on a theoretical comparison of the DA, PX-DA and marginal augmentation algorithms. Our focus is on situations where the group structure exploited by Liu and Wu is available. We show that the PX-DA algorithm based on Haar measure is at least as good as any PX-DA algorithm constructed using a proper prior on the group.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.