Scott C. Schmidler scite author profile

We present a novel method for predicting the secondary structure of a protein from its amino acid sequence. Most existing methods predict each position in turn based on a local window of residues, sliding this window along the length of the sequence. In contrast, we develop a probabilistic model of protein sequence/structure relationships in terms of structural segments, and formulate secondary structure prediction as a general Bayesian inference problem. A distinctive feature of our approach is the ability to develop explicit probabilistic models for -helices, -strands, and other classes of secondary structure, incorporating experimentally and empirically observed aspects of protein structure such as helical capping signals, side chain correlations, and segment length distributions. Our model is Markovian in the segments, permitting ef cient exact calculation of the posterior probability distribution over all possible segmentations of the sequence using dynamic programming. The optimal segmentation is computed and compared to a predictor based on marginal posterior modes, and the latter is shown to provide signi cant improvement in predictive accuracy. The marginalization procedure provides exact secondary structure probabilities at each sequence position, which are shown to be reliable estimates of prediction uncertainty. We apply this model to a database of 452 nonhomologous structures, achieving accuracies as high as the best currently available methods. We conclude by discussing an extension of this framework to model nonlocal interactions in protein structures, providing a possible direction for future improvements in secondary structure prediction accuracy.

show abstract

Conditions for rapid mixing of parallel and simulated tempering on multimodal distributions

Woodard¹,

Schmidler²,

Huber³

2009

Ann. Appl. Probab.

111

View full text Add to dashboard Cite

We give conditions under which a Markov chain constructed via parallel or simulated tempering is guaranteed to be rapidly mixing, which are applicable to a wide range of multimodal distributions arising in Bayesian statistical inference and statistical mechanics. We provide lower bounds on the spectral gaps of parallel and simulated tempering. These bounds imply a single set of sufficient conditions for rapid mixing of both techniques. A direct consequence of our results is rapid mixing of parallel and simulated tempering for several normal mixture models, and for the mean-field Ising model.

show abstract

Bayesian model search and multilevel inference for SNP association studies

Wilson¹,

Iversen²,

Clyde³

et al. 2010

Ann. Appl. Stat.

View full text Add to dashboard Cite

Technological advances in genotyping have given rise to hypothesis-based association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the SNPs (single nucleotide polymorphisms), choice of their genetic parametrization and missing data. In this paper we present an efficient Bayesian model search strategy that searches over the space of genetic markers and their genetic parametrization. The resulting method for Multilevel Inference of SNP Associations, MISA, allows computation of multilevel posterior probabilities and Bayes factors at the global, gene and SNP level, with the prior distribution on SNP inclusion in the model providing an intrinsic multiplicity correction. We use simulated data sets to characterize MISA’s statistical power, and show that MISA has higher power to detect association than standard procedures. Using data from the North Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified by standard methods and have been externally “validated” in independent studies. We examine sensitivity of the NCOCS results to prior choice and method for imputing missing data. MISA is available in an R package on CRAN.

show abstract

Ligand Concentration Regulates the Pathways of Coupled Protein Folding and Binding

et al. 2014

View full text Add to dashboard Cite

Coupled ligand binding and conformational change plays a central role in biological regulation. Ligands often regulate protein function by modulating conformational dynamics, yet the order in which binding and conformational change occurs are often hotly debated. Here we show that the “conformational selection versus induced fit” on which this debate is based is a false dichotomy because the mechanism depends on ligand concentration. Using the binding of pyrophosphate (PPi) to B. subtilis RNase P protein as a model, we show that coupled reactions are best understood as a change in flux between competing pathways with distinct orders of binding and conformational change. The degree of partitioning through each pathway depends strongly on PPi concentration, with ligand binding redistributing the conformational ensemble toward the folded state by both increasing folding rates and decreasing unfolding rates. These results indicate that ligand binding induces marked and varied changes in protein conformational dynamics, and that the order of binding and conformational change is ligand concentration dependent.

show abstract

Tree Topology Estimation

Estrada

Tomasi

Schmidler

et al. 2015

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

imaging techniques allow us to image trees. However, an image of a tree typically includes spurious branch crossings, and the original relationships of ancestry among edges may be lost. We present a methodology for estimating the most likely topology of a rooted, directed, three-dimensional tree given a single two-dimensional image of it. We regularize this inverse problem via a prior parametric tree-growth model that realistically captures the morphology of a wide variety of trees. We show that the problem of estimating the optimal tree has linear complexity if ancestry is known, but is NP-hard if it is lost. For the latter case, we present both a greedy approximation algorithm and a heuristic search algorithm that effectively explore the space of possible trees. Experimental results on retinal vessel, plant root, and synthetic tree datasets show that our methodology is both accurate and efficient.iv

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.