The general theory of the construction of scale-consistent energy terms in the coarse-grained force fields presented in Paper I of this series has been applied to the revision of the UNRES force field for physics-based simulations of proteins. The potentials of mean force corresponding to backbone-local and backbone-correlation energy terms were calculated from the ab initio energy surfaces of terminally blocked glycine, alanine, and proline, and the respective analytical expressions, derived by using the scale-consistent formalism, were fitted to them. The parameters of all these potentials depend on single-residue types, thus reducing their number and preventing over-fitting. The UNRES force field with the revised backbone-local and backbone-correlation terms was calibrated with a set of four small proteins with basic folds: tryptophan cage variant (TRP1; α), Full Sequence Design (FSD; α + β), villin headpiece (villin; α), and a truncated FBP-28 WW-domain variant (2MWD; β) (the NEWCT-4P force field) and, subsequently, with an enhanced set of 9 proteins composed of TRP1, FSD, villin, 1BDC (α), 2I18 (α), 1QHK (α + β), 2N9L (α + β), 1E0L (β), and 2LX7 (β) (the NEWCT-9P force field). The NEWCT-9P force field performed better than NEWCT-4P in a blind-prediction-like test with a set of 26 proteins not used in calibration and outperformed, in a test with 76 proteins, the most advanced OPT-WTFSA-2 version of UNRES with former backbone-local and backbone-correlation terms that contained more energy terms and more optimizable parameters. The NEWCT-9P force field reproduced the bimodal distribution of backbone-virtual-bond angles in the simulated structures, as observed in experimental protein structures.
The folding temperature of the trp-cage mini-protein was determined to be in the range 311-317 K depending on the method used. Our study is focused on determining the structure and dynamics of the polypeptide chain close to its unfolding or melting temperature. At T = 305 K, Trp6-Arg16 and Trp6-Pro12 long-range interactions are observed, and at T = 313 K, only the Trp6-Arg16 interactions remain, while all of mentioned interactions are observed in the native state of the protein. Partial (at T = 305 K) and complete (at T = 313 K) melting of the N-terminal α-helix is observed, manifested by the appearance of minor sets of signals in NMR spectra. Our key findings are: (i) conformational phase transition (melting point) could be described as a cooperative breaking of the Trp6-Pro12 long-range hydrophobic interaction and the melting of the N-terminal α-helix; (ii) many ROE signals corresponding to local or short-range interactions vanish rapidly with temperature increase; however, long-range interaction such as Trp6-Arg16 remains until 313 K. The presence of the native long-range interaction at 313 K makes that conformational ensemble resemble a very diffuse native state structure, but it is not a simple mixture of the folded and unfolded states, as could be expected on the basis of the common two-state folding mechanism.
Two peptides, corresponding to the turn region of the C-terminal β-hairpin of the B3 domain of the immunoglobulin binding protein G from Streptoccocus, consisting of residues 51-56 ] and 50-57 ], respectively, were studied by CD and NMR spectroscopy at various temperatures and by differential scanning calorimetry. Our results show that the part of the sequence corresponding to the β-turn in the native structure (DDATKT) of the B3 domain forms bent conformations similar to those observed in the native protein. The formation of a turn is observed for both peptides in a broad range of temperatures (T = 283-323 K), which confirms the conclusion drawn from our previous studies of longer sequences from the C-terminal β-hairpin of the B3 domain of the immunoglobulin binding protein G (16, 14 and 12 residues), that the DDATKT sequence forms a nucleation site for formation of the β-hairpin structure of peptides corresponding to the C-terminal part of all the B domains of the immunoglobulin binding protein G. We also show and discuss the role of long-range hydrophobic interactions as well as local conformational properties of polypeptide chains in the mechanism of formation of the β-hairpin structure.
By using the maximum likelihood method for force-field calibration recently developed in our laboratory, which is aimed at achieving the agreement between the simulated conformational ensembles of selected training proteins and the corresponding ensembles determined experimentally at various temperatures, the physics-based coarse-grained UNRES force field for simulations of protein structure and dynamics was optimized with seven small training proteins exhibiting a variety of secondary and tertiary structures. Four runs of optimization, in which the number of optimized force-field parameters was gradually increased, were carried out, and the resulting force fields were subsequently tested with a set of 22 α-, 12 β-, and 12 α + β-proteins not used in optimization. The variant in which energy-term weights, local, and correlation potentials, side-chain radii, and anisotropies were optimized turned out to be the most transferable and outperformed all previous versions of UNRES on the test set.
A new approach to the calibration of the force fields is proposed, in which the force-field parameters are obtained by maximum-likelihood fitting of the calculated conformational ensembles to the experimental ensembles of training system(s). The maximum-likelihood function is composed of logarithms of the Boltzmann probabilities of the experimental conformations, calculated with the current energy function. Because the theoretical distribution is given in the form of the simulated conformations only, the contributions from all of the simulated conformations, with Gaussian weights in the distances from a given experimental conformation, are added to give the contribution to the target function from this conformation. In contrast to earlier methods for force-field calibration, the approach does not suffer from the arbitrariness of dividing the decoy set into native-like and non-native structures; however, if such a division is made instead of using Gaussian weights, application of the maximum-likelihood method results in the well-known energy-gap maximization. The computational procedure consists of cycles of decoy generation and maximum-likelihood-function optimization, which are iterated until convergence is reached. The method was tested with Gaussian distributions and then applied to the physics-based coarse-grained UNRES force field for proteins. The NMR structures of the tryptophan cage, a small α-helical protein, determined at three temperatures (T = 280, 305, and 313 K) by Hałabis et al. ( J. Phys. Chem. B 2012 , 116 , 6898 - 6907 ), were used. Multiplexed replica-exchange molecular dynamics was used to generate the decoys. The iterative procedure exhibited steady convergence. Three variants of optimization were tried: optimization of the energy-term weights alone and use of the experimental ensemble of the folded protein only at T = 280 K (run 1); optimization of the energy-term weights and use of experimental ensembles at all three temperatures (run 2); and optimization of the energy-term weights and the coefficients of the torsional and multibody energy terms and use of experimental ensembles at all three temperatures (run 3). The force fields were subsequently tested with a set of 14 α-helical and two α + β proteins. Optimization run 1 resulted in better agreement with the experimental ensemble at T = 280 K compared with optimization run 2 and in comparable performance on the test set but poorer agreement of the calculated folding temperature with the experimental folding temperature. Optimization run 3 resulted in the best fit of the calculated ensembles to the experimental ones for the tryptophan cage but in much poorer performance on the training set, suggesting that use of a small α-helical protein for extensive force-field calibration resulted in overfitting of the data for this protein at the expense of transferability. The optimized force field resulting from run 2 was found to fold 13 of the 14 tested α-helical proteins and one small α + β protein with the correct topologies; the average structu...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.