Rapid, accurate prediction of protein structure from amino acid sequence would accelerate fields as diverse as drug discovery, synthetic biology and disease diagnosis. Massively improved prediction of protein structures has been driven by improving the prediction of the amino acid residues that contact in their 3D structure. For an average globular protein, around 92% of all residue pairs are non-contacting, therefore accurate prediction of only a small percentage of inter-amino acid distances could increase the number of constraints to guide structure determination. We have trained deep neural networks to predict inter-residue contacts and distances. Distances are predicted with an accuracy better than most contact prediction techniques. Addition of distance constraints improved de novo structure predictions for test sets of 158 protein structures, as compared to using the best contact prediction methods alone. Importantly, usage of distance predictions allows the selection of better models from the structure pool without a need for an external model assessment tool. The results also indicate how the accuracy of distance prediction methods might be improved further.
Graphical abstract
These authors contributed equally to this work. AbstractRapid, accurate prediction of protein structure from amino acid sequence would accelerate fields as diverse as drug discovery, synthetic biology and disease diagnosis. Massively improved prediction of protein structures has been driven by improving the prediction of the amino acid residues that contact in their 3D structure. For an average globular protein, around 92% of all residue pairs are non-contacting, therefore accurate prediction of only a small percentage of inter-amino acid distances could increase the number of constraints to guide structure determination. We have trained deep neural networks to predict inter-residue contacts and distances. Distances are predicted with an accuracy better than most contact prediction techniques. Addition of distance constraints improved de novo structure predictions for test sets of 158 protein structures, as compared to using the best contact prediction methods alone. Importantly, usage of distance predictions allows the selection of better models from the structure pool without a need for an external model assessment tool. The results also indicate how the accuracy of distance prediction methods might be improved further.
Polypeptides with multiple enzyme domains, such as type I polyketide synthases, produce chemically complex compounds that are difficult to produce via conventional chemical synthesis and are often pharmaceutically or otherwise commercially valuable. Engineering polyketide synthases, via domain swapping and/or site directed mutagenesis, in order to generate novel polyketides, has tended to produce either low yields of product or no product at all. The success of such experiments may be limited by our inability to predict the key functional residues and boundaries of protein domains. Computational tools to identify the boundaries and the residues determining the substrate specificity of domains could reduce the trial and error involved in engineering multi-domain proteins. In this study we use statistical coupling analysis to identify networks of co-evolving residues in type I polyketide synthases, thereby predicting domain boundaries. We extend the method to predicting key residues for enzyme substrate specificity. We introduce bootstrapping calculations to test the relationship between sequence length and the number of sequences needed for a robust analysis. Our results show no simple predictor of the number of sequences needed for an analysis, which can be as few as a hundred and as many as a few thousand. We find that polyketide synthases contain multiple networks of co-substituting residues: some are intradomain but most multiple domains. Some networks of coupled residues correlate with specific functions such as the substrate specificity of the acyl transferase domain, the stereo chemistry of the ketoreductase domain, or domain boundaries that are consistent with experimental data. Our extension of the method provides a ranking of the likely importance of these residues to enzyme substrate specificity, allowing us to propose residues for further mutagenesis work. We conclude that analysis of co-evolving networks of residues is likely to be an important tool for re-engineering multi-domain proteins.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.