Ion mobility (IM) mass spectrometry provides structural information about protein shape and size in the form of an orientationally-averaged collision cross-section (CCSIM). While IM data have been used with various computational methods, they have not yet been utilized to predict monomeric protein structure from sequence. Here, we show that IM data can significantly improve protein structure determination using the modelling suite Rosetta. We develop the Rosetta Projection Approximation using Rough Circular Shapes (PARCS) algorithm that allows for fast and accurate prediction of CCSIM from structure. Following successful testing of the PARCS algorithm, we use an integrative modelling approach to utilize IM data for protein structure prediction. Additionally, we propose a confidence metric that identifies near native models in the absence of a known structure. The results of this study demonstrate the ability of IM data to consistently improve protein structure prediction.
The combination of deep learning and sequence data has transformed protein structure prediction and modeling, evidenced in the success of AlphaFold (AF). For this reason, many methods have been developed to take advantage of this success in areas where inaccurate structural modeling may limit computational predictiveness. For example, many methods have been developed to predict protein intrinsic disorder from sequence, including our Rosetta ResidueDisorder (RRD) approach. Intrinsically disordered regions in proteins are parts of the sequence that do not form ordered, folded structures under typical physiological conditions. In the original implementation of RRD, Rosetta ab initio models were generated, and disordered regions were predicted based on residue scores (disordered residues typically exist in regions of unfavorable scores). In this work, we show that by (i) replacing the ab initio modeling with AF (using the same scoring and disorder assignment approach) and (ii) updating the score function, the predictiveness improved significantly. Residues were better ranked by the order/disorder, evidenced by an improvement in receiver operating characteristic area-under-the-curve from 0.69 to 0.78 on a large (229 protein) and balanced data set (relatively even ordered versus disordered residues). Finally, the binary prediction accuracy also improved from 62% to 74% on the same data set. Our results show that the combined AF-RRD approach was as good as or better than all existing methods by these metrics (AF-RRD had the highest prediction accuracy).
Machine learning (ML) has revolutionised the field of structure-based drug design (SBDD) in recent years. During the training stage, ML techniques typically analyse large amounts of experimentally determined data to create predictive models in order to inform the drug discovery process. Deep learning (DL) is a subfield of ML, that relies on multiple layers of a neural network to extract significantly more complex patterns from experimental data, and has recently become a popular choice in SBDD. This review provides a thorough summary of the recent DL trends in SBDD with a particular focus on de novo drug design, binding site prediction, and binding affinity prediction of small molecules.
Ion mobility (IM) coupled to mass spectrometry informs on the shape and size of protein structures in the form of a collision cross section (CCSIM). While there are several computational methods for predicting CCSIM based on protein structures, including our previously developed PARCS, the process usually requires prior experience with the command-line interface (CLI). To overcome this challenge, here we present a web application on the ROSIE webserver to predict CCSIM from protein structure using projection approximation with PARCS. In this web interface, the user is only required to provide one or more PDB files as input. Results from our case studies suggest that CCSIM predictions (with ROSIE-PARCS) are highly accurate with an average error of 6.12%. Furthermore, the absolute difference between CCSIM and CCSPARCS can help in distinguishing accurate from inaccurate AlphaFold2 protein structure predictions. ROSIE-PARCS is designed with a user-friendly interface, is available publicly, and is free to use. The ROSIE-PARCS web interface is supported by all major web browsers and can be accessed via this link (https://rosie.graylab.jhu.edu).
Understanding the relationship between protein structure and experimental data is crucial for utilizing experiments to solve biochemical problems and optimizing the use of sparse experimental data for structural interpretation. Tandem mass spectrometry (MS/MS) can be used with a variety of methods to collect structural data for proteins. One example is surface-induced dissociation (SID), which is used to break apart protein complexes (via a surface collision) into intact subcomplexes and can be performed at multiple laboratory frame SID collision energies. These energy-resolved MS/MS experiments have shown that the profile of the breakages depends on the acceleration energy of the collision. It is possible to extract an appearance energy (AE) from energy-resolved mass spectrometry (ERMS) data, which shows the relative intensity of each type of subcomplex as a function of SID acceleration energy. We previously determined that these AE values for specific interfaces correlated with structural features related to interface strength. In this study, we further examined the structural relationships by developing a method to predict the full ERMS plot from the structure, rather than extracting a single value. First, we noted that for proteins with multiple interface types, we could reproduce the correct shapes of breakdown curves, further confirming previous structural hypotheses. Next, we demonstrated that interface size and energy density (measured using Rosetta) correlated with data derived from the ERMS plot (R 2 = 0.71). Furthermore, based on this trend, we used native crystal structures to predict ERMS. The majority of predictions resulted in good agreement, and the average root-mean-square error was 0.20 for the 20 complexes in our data set. We also show that if additional information on cleavage as a function of collision energy could be obtained, the accuracy of predictions improved further. Finally, we demonstrated that ERMS prediction results were better for the native than for inaccurate models in 17/20 cases. An application to run this simulation has been developed in Rosetta, which is freely available for use.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.