We describe a novel method for the structural optimization of molecular systems. Similar to genetic algorithms (GA), our approach involves an evolving population in which new members are formed by cutting and pasting operations on existing members. Unlike previous GA's, however, the population in each generation has a single parent only. This scheme has been used to optimize Si clusters with 13-23 atoms. We have found a number of new isomers that are lower in energy than any previously reported and have properties in much better agreement with experimental data.
Native proteins have been optimized by evolution simultaneously for structure and sequence. Structural databases reflect this interdependency. In this paper, we present a new statistical potential for a reduced backbone representation that has both structure and sequence characteristics as variables. We use information from structural data available in the Protein Coil Library, selected on the basis of resolution and refinement factor. In these structures, the nonlocal interactions are randomly distributed and, thus, average out in statistics, so structural propensities due to local backbone-based interactions can be studied separately. We collect data in the form of local sequence-specific phi-psi backbone dihedral pairs. From these data, we construct dihedral probability density functions (DPDFs) that quantify any adjacent phi-psi pair distribution in the context of all possible combinations of local residue types. We use a probabilistic analysis to deduce how the correlations encoded in the various DPDFs as well as in residue frequencies propagate along the sequence and can be cumulated in a statistical potential capable of efficiently scoring a loop by its backbone conformation and sequence only. Our potential is able to identify with high accuracy the native structure of a loop with a given sequence among possible alternative conformations from sets of well-constructed decoys. Conversely, the potential can also be used for sequence prediction problems and is shown to score the native sequence of a given loop structure among the most fit of the possible sequence combinations. Applications for both structure prediction and sequence design are discussed.
Accurately predicting loop structures is important for understanding functions of many proteins. In order to obtain loop models with high accuracy, efficiently sampling the loop conformation space to discover reasonable structures is a critical step. In loop conformation sampling, coarse-grain energy (scoring) functions coupling with reduced protein representations are often used to reduce the number of degrees of freedom as well as sampling computational time. However, due to implicitly considering many factors by reduced representations, the coarse-grain scoring functions may have potential insensitivity and inaccuracy, which can mislead the sampling process and consequently ignore important loop conformations. In this paper, we present a new computational sampling approach to obtain reasonable loop backbone models, so-called the Pareto Optimal Sampling (POS) method. The rationale of the POS method is to sample the function space of multiple, carefully-selected scoring functions to discover an ensemble of diversified structures yielding Pareto optimality to all sampled conformations. POS method can efficiently tolerate insensitivity and inaccuracy in individual scoring functions and thereby lead to significant accuracy improvement in loop structure prediction. We apply the POS method to a set of 4- to 12-residue loop targets using a function space composed of backbone-only Rosetta, DFIRE, and a triplet backbone dihedral potential developed in our lab. Our computational results show that in 501 out of 502 targets, the model sets generated by POS contain structure models are within subangstrom resolution. Moreover, the top-ranked models have Root Mean Square Deviation (RMSD) less than 1A in 96.8%, 84.1%, and 72.2% of the short (4~6 residues), medium (7~9 residues), and long (10~12) targets, respectively, when the all-atom models are generated by local optimization from the backbone models and are ranked by our recently developed Pareto Optimal Consensus (POC) method. Similar sampling effectiveness can also be found in a set of 13-residue loop targets.
BackgroundAccurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction.ResultsWe have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of ~20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods.ConclusionsBy integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.