A unified coarse-grained model of three major classes of biological molecules—proteins, nucleic acids, and polysaccharides—has been developed. It is based on the observations that the repeated units of biopolymers (peptide groups, nucleic acid bases, sugar rings) are highly polar and their charge distributions can be represented crudely as point multipoles. The model is an extension of the united residue (UNRES) coarse-grained model of proteins developed previously in our laboratory. The respective force fields are defined as the potentials of mean force of biomacromolecules immersed in water, where all degrees of freedom not considered in the model have been averaged out. Reducing the representation to one center per polar interaction site leads to the representation of average site–site interactions as mean-field dipole–dipole interactions. Further expansion of the potentials of mean force of biopolymer chains into Kubo’s cluster-cumulant series leads to the appearance of mean-field dipole–dipole interactions, averaged in the context of local interactions within a biopolymer unit. These mean-field interactions account for the formation of regular structures encountered in biomacromolecules, e.g., α-helices and β-sheets in proteins, double helices in nucleic acids, and helicoidally packed structures in polysaccharides, which enables us to use a greatly reduced number of interacting sites without sacrificing the ability to reproduce the correct architecture. This reduction results in an extension of the simulation timescale by more than four orders of magnitude compared to the all-atom representation. Examples of the performance of the model are presented.FigureComponents of the Unified Coarse Grained Model (UCGM) of biological macromolecules
A general and systematic method for the derivation of the functional expressions for the effective energy terms in coarse-grained force fields of polymer chains is proposed. The method is based on the expansion of the potential of mean force of the system studied in the cluster-cumulant series and expanding the all-atom energy in the Taylor series in the squares of interatomic distances about the squares of the distances between coarse-grained centers, to obtain approximate analytical expressions for the cluster cumulants. The primary degrees of freedom to average about are the angles for collective rotation of the atoms contained in the coarse-grained interaction sites about the respective virtual-bond axes. The approach has been applied to the revision of the virtual-bond-angle, virtual-bond-torsional, and backbone-local-and-electrostatic correlation potentials for the UNited RESidue (UNRES) model of polypeptide chains, demonstrating the strong dependence of the torsional and correlation potentials on virtual-bond angles, not considered in the current UNRES. The theoretical considerations are illustrated with the potentials calculated from the ab initiopotential-energysurface of terminally blocked alanine by numerical integration and with the statistical potentials derived from known protein structures. The revised torsional potentials correctly indicate that virtual-bond angles close to 90° result in the preference for the turn and helical structures, while large virtual-bond angles result in the preference for polyproline II and extended backbone geometry. The revised correlation potentials correctly reproduce the preference for the formation of β-sheet structures for large values of virtual-bond angles and for the formation of α-helical structures for virtual-bond angles close to 90°.
The general theory of the construction of scale-consistent energy terms in the coarse-grained force fields presented in Paper I of this series has been applied to the revision of the UNRES force field for physics-based simulations of proteins. The potentials of mean force corresponding to backbone-local and backbone-correlation energy terms were calculated from the ab initio energy surfaces of terminally blocked glycine, alanine, and proline, and the respective analytical expressions, derived by using the scale-consistent formalism, were fitted to them. The parameters of all these potentials depend on single-residue types, thus reducing their number and preventing over-fitting. The UNRES force field with the revised backbone-local and backbone-correlation terms was calibrated with a set of four small proteins with basic folds: tryptophan cage variant (TRP1; α), Full Sequence Design (FSD; α + β), villin headpiece (villin; α), and a truncated FBP-28 WW-domain variant (2MWD; β) (the NEWCT-4P force field) and, subsequently, with an enhanced set of 9 proteins composed of TRP1, FSD, villin, 1BDC (α), 2I18 (α), 1QHK (α + β), 2N9L (α + β), 1E0L (β), and 2LX7 (β) (the NEWCT-9P force field). The NEWCT-9P force field performed better than NEWCT-4P in a blind-prediction-like test with a set of 26 proteins not used in calibration and outperformed, in a test with 76 proteins, the most advanced OPT-WTFSA-2 version of UNRES with former backbone-local and backbone-correlation terms that contained more energy terms and more optimizable parameters. The NEWCT-9P force field reproduced the bimodal distribution of backbone-virtual-bond angles in the simulated structures, as observed in experimental protein structures.
The performance of the physics-based protocol, whose main component is the United Residue (UNRES) physics-based coarse-grained force field, developed in our laboratory for the prediction of protein structure from amino acid sequence, is illustrated. Candidate models are selected, based on probabilities of the conformational families determined by multiplexed replica-exchange simulations, from the 10th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP10). For target T0663, classified as a new fold, which consists of two α + β domains homologous to those of known proteins, UNRES predicted the correct symmetry of packing, in which the domains are rotated with respect to each other by 180°in the experimental structure. By contrast, models obtained by knowledge-based methods, in which each domain is modeled very accurately but not rotated, resulted in incorrect packing. Two UNRES models of this target were featured by the assessors. Correct domain packing was also predicted by UNRES for the homologous target T0644, which has a similar structure to that of T0663, except that the two domains are not rotated. Predictions for two other targets, T0668 and T0684_D2, are among the best ones by global distance test score. These results suggest that our physicsbased method has substantial predictive power. In particular, it has the ability to predict domain-domain orientations, which is a significant advance in the state of the art.protein folding | structure symmetry | multi-domain packing P rediction of protein structures from amino acid sequence still remains an unsolved problem of computational biology. Although, since the famous experiments by Anfinsen (1), it is known that a protein adopts the structure which is the (kinetically reachable) global minimum of the free energy of a system, it is not straightforward to implement this physical principle in practice because of the inaccuracy of existing force fields and because of the enormous difficulty to search the conformational space of the system. Therefore, the most effective methods for protein-structure prediction nowadays are knowledge-based approaches, in which database information is incorporated explicitly into the procedure (2). These methods can be divided into three categories, namely, comparative (homology) modeling (3-5), in which the target sequence is compared with the sequences for which experimental structures are known and those structures are usually selected as candidate models for which the greatest similarity is observed; threading (6-8), in which the target sequence is superposed on structures from a database, and those which give the highest score (lowest pseudoenergy) are selected as candidate predictions; and, finally, the fragment-assembly or minithreading method developed by David Baker and colleagues (9, 10), in which the predicted structure is assembled from nine-residue fragments extracted from a protein-structure database, and knowledge-and physicsbased filters are applied at each asse...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.