Markov chain Monte Carlo (MCMC) or the Metropolis-Hastings algorithm is a simulation algorithm that has made modern Bayesian statistical inference possible. Nevertheless, the efficiency of different Metropolis-Hastings proposal kernels has rarely been studied except for the Gaussian proposal. Here we propose a unique class of Bactrian kernels, which avoid proposing values that are very close to the current value, and compare their efficiency with a number of proposals for simulating different target distributions, with efficiency measured by the asymptotic variance of a parameter estimate. The uniform kernel is found to be more efficient than the Gaussian kernel, whereas the Bactrian kernel is even better. When optimal scales are used for both, the Bactrian kernel is at least 50% more efficient than the Gaussian. Implementation in a Bayesian program for molecular clock dating confirms the general applicability of our results to generic MCMC algorithms. Our results refute a previous claim that all proposals had nearly identical performance and will prompt further research into efficient MCMC proposals.Bayesian inference | mixing | convergence rate M arkov chain Monte Carlo (MCMC) algorithms can be used to simulate a probability distribution π(x) that is known only up to a factor, that is, with only πðyÞ πðxÞ known; they are especially important in Bayesian inference where π is the posterior distribution. In a Metropolis-Hastings (MH) algorithm (1, 2), a proposal density q(yjx), with x, y ∈ χ, is used to generate a new state y given the current state x. The proposal is accepted with probability α(x, y). If the proposal is accepted, the new state becomes y; otherwise it stays at x. The algorithm generates a discrete-time Markov chain with state space χ and transition law P having transition probability density pðx; yÞ = qð yjxÞ · αðx; yÞ; y ≠ x; 1 − Z χ qðyjxÞ · αðx; yÞdy; y = x:The acceptance probability α is chosen so that the detailed balance condition is satisfied: π(x)p(x, y) = π(y)p(y, x), for all x, y ∈ χ. The MH choice of α is αðx; yÞ = min & 1; πðyÞ πðxÞ × qðxjyÞ qðyjxÞThe proposal kernel q(yjx) can be very general; as long as it specifies an irreducible aperiodic Markov chain, the algorithm will generate a reversible Markov chain with stationary distribution π. Here we assume that χ is a subset of R k and both π(y) and q(yjx) are densities on χ.Given a sample x 1 , x 2 , . . ., x n simulated from P, the expectation of any function f(x) over πcan be approximated by the time average over the sampleĨThis is an unbiased estimate of I, and converges to I according to the central limit theorem, independent of the initial state x 0 (ref. 3, p. 99). The asymptotic variance ofĨ iswhere V f = var π {f(X)} is the variance of f(X) over π, ρ k = corr{f(x i ), f(x i + k )} is the lag-k autocorrelation (ref. 4, pp. 87-92), and the variance ratio E = V f /ν is the efficiency: an MCMC sample of size N is as informative about I as an independent sample of size NE. Thus, NE is known as the effective sample size. Given π an...