Motivation Poor protein solubility hinders the production of many therapeutic and industrially useful proteins. Experimental efforts to increase solubility are plagued by low success rates and often reduce biological activity. Computational prediction of protein expressibility and solubility in Escherichia coli using only sequence information could reduce the cost of experimental studies by enabling prioritization of highly soluble proteins. Results A new tool for sequence-based prediction of soluble protein expression in E.coli, SoluProt, was created using the gradient boosting machine technique with the TargetTrack database as a training set. When evaluated against a balanced independent test set derived from the NESG database, SoluProt’s accuracy of 58.5% and AUC of 0.62 exceeded those of a suite of alternative solubility prediction tools. There is also evidence that it could significantly increase the success rate of experimental protein studies. SoluProt is freely available as a standalone program and a user-friendly webserver at https://loschmidt.chemi.muni.cz/soluprot/. Availability and implementation https://loschmidt.chemi.muni.cz/soluprot/. Supplementary information Supplementary data are available at Bioinformatics online.
Stability is one of the most important characteristics of proteins employed as biocatalysts, biotherapeutics, and biomaterials, and the role of computational approaches in modifying protein stability is rapidly expanding. We have recently identified stabilizing mutations in haloalkane dehalogenase DhaA using phylogenetic analysis but were not able to reproduce the effects of these mutations using force-field calculations. Here we tested four different hypotheses to explain the molecular basis of stabilization using structural, biochemical, biophysical, and computational analyses. We demonstrate that stabilization of DhaA by the mutations identified using the phylogenetic analysis is driven by both entropy and enthalpy contributions, in contrast to primarily enthalpy-driven stabilization by mutations designed by the force-field calculations. Comprehensive bioinformatics analysis revealed that more than half (53%) of 1 099 evolution-based stabilizing mutations would be evaluated as destabilizing by force-field calculations. Thermodynamic integration considers both folded and unfolded states and can describe the entropic component of stabilization, yet it is not suitable for predictive purposes due to its high computational demands. Altogether, our results strongly suggest that energetic calculations should be complemented by a phylogenetic analysis in protein-stabilization endeavors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.