One of the most demanding calculations is to generate random samples from a specified probability distribution (usually with an unknown normalizing prefactor) in a high-dimensional configuration space. One often has to resort to using a Markov chain Monte Carlo method, which converges only in the limit to the prescribed distribution. Such methods typically inch through configuration space step by step, with acceptance of a step based on a Metropolis(-Hastings) criterion. An acceptance rate of 100% is possible in principle by embedding configuration space in a higher dimensional phase space and using ordinary differential equations. In practice, numerical integrators must be used, lowering the acceptance rate. This is the essence of hybrid Monte Carlo methods. Presented is a general framework for constructing such methods under relaxed conditions: the only geometric property needed is (weakened) reversibility; volume preservation is not needed. The possibilities are illustrated by deriving a couple of explicit hybrid Monte Carlo methods, one based on barrier-lowering variable-metric dynamics and another based on isokinetic dynamics.
Abstract. Multi-channel, high throughput experimental methodologies for flow cytometry are transforming clinical immunology and hematology, and require the development of algorithms to analyze the highdimensional, large-scale data. We describe the development of two combinatorial algorithms to identify rare cell populations in data from mice with acute promyelocytic leukemia. The flow cytometry data is clustered, and then samples from the leukemic, pre-leukemic, and Wild Type mice are compared to identify clusters belonging to the diseased state. We describe three metrics on the clustered data that help in identifying rare populations. We formulate a generalized edge cover approach in a bipartite graph model to directly compare clusters in two samples to identify clusters belonging to one but not the other sample. For detecting rare populations common to many diseased samples but not to the Wild Type, we describe a clique-based branch and bound algorithm. We provide statistical justification of the significance of the rare populations.
Markov chain Monte Carlo sampling propagators, including numerical integrators for stochastic dynamics, are central to the calculation of thermodynamic quantities and determination of structure for molecular systems. Efficiency is paramount, and to a great extent, this is determined by the integrated autocorrelation time (IAcT). This quantity varies depending on the observable that is being estimated. It is suggested that it is the maximum of the IAcT over all observables that is the relevant metric. Reviewed here is a method for estimating this quantity. For reversible propagators (which are those that satisfy detailed balance), the maximum IAcT is determined by the spectral gap in the forward transfer operator, but for irreversible propagators, the maximum IAcT can be far less than or greater than what might be inferred from the spectral gap. This is consistent with recent theoretical results (not to mention past practical experience) suggesting that irreversible propagators generally perform better if not much better than reversible ones. Typical irreversible propagators have a parameter controlling the mix of ballistic and diffusive movement. To gain insight into the effect of the damping parameter for Langevin dynamics, its optimal value is obtained here for a multidimensional quadratic potential energy function.
The efficiency of a Markov chain Monte Carlo algorithm for estimating the mean of a function of interest might be measured by the cost of generating one independent sample, or equivalently, the total cost divided by the effective sample size, defined in terms of the integrated autocorrelation time. To ensure the reliability of such an estimate, it is suggested that there be an adequate sampling of state space— to the extent that this can be determined from the available samples. A sufficient condition for adequate sampling is derived in terms of the supremum of all possible integrated autocorrelation times, which leads to a more stringent condition for adequate sampling than that simply obtained from integrated autocorrelation times for functions of interest. A method for estimating the supremum of all integrated autocorrelation times, based on approximation in a finite-dimensional subspace, is derived and evaluated empirically.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.