The IntFOLD server based at the University of Reading has been a leading method over the past decade in providing free access to accurate prediction of protein structures and functions. In a post-AlphaFold2 world, accurate models of tertiary structures are widely available for even more protein targets, so there has been a refocus in the prediction community towards the accurate modelling of protein-ligand interactions as well as modelling quaternary structure assemblies. In this paper, we describe the latest improvements to IntFOLD, which maintains its competitive structure prediction performance by including the latest deep learning methods while also integrating accurate model quality estimates and 3D models of protein-ligand interactions. Furthermore, we also introduce our two new server methods: MultiFOLD for accurately modelling both tertiary and quaternary structures, with performance which has been independently verified to outperform the standard AlphaFold2 methods, and ModFOLDdock, which provides world-leading quality estimates for quaternary structure models. The IntFOLD7, MultiFOLD and ModFOLDdock servers are available at: https://www.reading.ac.uk/bioinf/.
In CASP15, there was a greater emphasis on multimeric modeling than in previous experiments, with assembly structures nearly doubling in number (41 up from 22) since the previous round. CASP15 also included a new estimation of model accuracy (EMA) category in recognition of the importance of objective quality assessment (QA) for quaternary structure models. ModFOLDdock is a multimeric model QA server developed by the McGuffin group at the University of Reading, which brings together a range of single‐model, clustering, and deep learning methods to form a consensus of approaches. For CASP15, three variants of ModFOLDdock were developed to optimize for the different facets of the quality estimation problem. The standard ModFOLDdock variant produced predicted scores optimized for positive linear correlations with the observed scores. The ModFOLDdockR variant produced predicted scores optimized for ranking, that is, the top‐ranked models have the highest accuracy. In addition, the ModFOLDdockS variant used a quasi‐single model approach to score each model on an individual basis. The scores from all three variants achieved strongly positive Pearson correlation coefficients with the CASP observed scores (oligo‐lDDT) in excess of 0.70, which were maintained across both homomeric and heteromeric model populations. In addition, at least one of the ModFOLDdock variants was consistently ranked in the top two methods across all three EMA categories. Specifically, for overall global fold prediction accuracy, ModFOLDdock placed second and ModFOLDdockR placed third; for overall interface quality prediction accuracy, ModFOLDdockR, ModFOLDdock, and ModFOLDdockS were placed above all other predictor methods, and ModFOLDdockR and ModFOLDdockS were placed second and third respectively for individual residue confidence scores. The ModFOLDdock server is available at: https://www.reading.ac.uk/bioinf/ModFOLDdock/. ModFOLDdock is also available as part of the MultiFOLD docker package: https://hub.docker.com/r/mcguffin/multifold.
Motivation: The accuracy gap between predicted and experimental structures has been significantly reduced following the development of AlphaFold2. However, for further studies, such as drug discovery and protein design, AlphaFold2 structures need to be representative of proteins in solution, yet AlphaFold2 was trained to generate only a few structural conformations rather than a conformational landscape. In previous CASP experiments, MD simulation-based methods have been widely used to improve the accuracy of single 3D models. However, these methods are highly computationally intensive and less applicable for practical use in large-scale applications. Despite this, the refinement concept can still provide a better understanding of conformational dynamics and improve the quality of 3D models at a modest computational cost. Here, our ReFOLD4 pipeline was adopted to provide the conformational landscape of AlphaFold2 predictions while maintaining high model accuracy. In addition, the AlphaFold2 recycling process was utilised to improve 3D models by using them as custom template inputs for tertiary and quaternary structure predictions. Results: According to the Molprobity score, 94% of the generated 3D models by ReFOLD4 were improved. As measured by average change in lDDT, AlphaFold2 recycling showed an improvement rate of 87.5% (using MSAs) and 81.25% (using single sequences) for monomeric AF2 models and 100% (MSA) and 97.8% (single sequence) for monomeric non-AF2 models. By the same measure, the recycling of multimeric models showed an improvement rate of as much as 80% for AF2 models and 94% for non-AF2 models. The AlphaFold2 recycling processes and ReFOLD4 method can be combined very efficiently to provide conformational landscapes at the AlphaFold2-accuracy level, while also significantly improving the global quality of 3D models for both tertiary and quaternary structures, with much less computational complexity than traditional refinement methods.
Motivation The accuracy gap between predicted and experimental structures has been significantly reduced following the development of AlphaFold2 (AF2). However, for many targets, AF2 models still have room for improvement. In previous CASP experiments, highly computationally intensive MD simulation-based methods have been widely used to improve the accuracy of single 3D models. Here, our ReFOLD pipeline was adapted to refine AF2 predictions while maintaining high model accuracy at a modest computational cost. Furthermore, the AF2 recycling process was utilised to improve 3D models by using them as custom template inputs for tertiary and quaternary structure predictions. Results According to the Molprobity score, 94% of the generated 3D models by ReFOLD were improved. AF2 recycling showed an improvement rate of 87.5% (using MSAs) and 81.25% (using single sequences) for monomeric AF2 models and 100% (MSA) and 97.8% (single sequence) for monomeric non-AF2 models, as measured by the average change in lDDT. By the same measure, the recycling of multimeric models showed an improvement rate of as much as 80% for AF2-Multimer (AF2M) models and 94% for non-AF2M models. Availability Refinement using AlphaFold2-Multimer recycling is available as part of the MultiFOLD docker package (https://hub.docker.com/r/mcguffin/multifold). The ReFOLD server is available at https://www.reading.ac.uk/bioinf/ReFOLD/ and the modified scripts can be downloaded from https://www.reading.ac.uk/bioinf/downloads/. Supplementary information Supplementary data are available at Bioinformatics online.
In CASP15 there was a greater emphasis on multimeric modelling than in previous experiments, with assembly structures nearly doubling in number (41 up from 22) since the previous round. CASP15 also included a new estimation of model accuracy (EMA) category in recognition of the importance of objective quality assessment for quaternary structure models. ModFOLDdock is a multimeric model quality assessment server developed by the McGuffin group at the University of Reading, which brings together a range of single-model, clustering and deep learning methods to form a consensus of approaches. For CASP15 three variants of ModFOLDdock were developed to optimise for the different facets of the quality estimation problem. The standard ModFOLDdock variant produced predicted scores optimised for positive linear correlations with the observed scores. The ModFOLDdockR variant produced predicted scores optimised for ranking, i.e., the top-ranked models have highest accuracy. In addition, the ModFOLDdockS variant used a quasi-single model approach to score each model on an individual basis. The scores from all three variants achieved strongly positive Pearson correlation coefficients with the CASP observed scores (oligo-lDDT) in excess of 0.70, which were maintained across both homomeric and heteromeric model populations. In addition, at least one of the ModFOLDdock variants was consistently ranked in the top two methods across all three EMA categories. Specifically, for overall global fold prediction accuracy, ModFOLDdock placed second and ModFOLDdockR placed third; for overall interface quality prediction accuracy ModFOLDdockR, ModFOLDdock and ModFOLDdockS were placed above all other predictor methods, and ModFOLDdockR and ModFOLDdockS were placed second and third respectively for individual residue confidence scores. The ModFOLDdock server is available at: [https://www.reading.ac.uk/bioinf/ModFOLDdock/](https://www.reading.ac.uk/bioinf/ModFOLDdock/). ModFOLDdock is also available as part of the MultiFOLD docker package: [https://hub.docker.com/r/mcguffin/multifold](https://hub.docker.com/r/mcguffin/multifold)
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.