We have recently completed a full re-architecturing of the Rosetta molecular modeling program, generalizing and expanding its existing functionality. The new architecture enables the rapid prototyping of novel protocols by providing easy to use interfaces to powerful tools for molecular modeling. The source code of this rearchitecturing has been released as Rosetta3 and is freely available for academic use. At the time of its release, it contained 470,000 lines of code. Counting currently unpublished protocols at the time of this writing, the source includes 1,285,000 lines. Its rapid growth is a testament to its ease of use. This document describes the requirements for our new architecture, justifies the design decisions, sketches out central classes, and highlights a few of the common tasks that the new software can perform.
We describe predictions made using the Rosetta structure prediction methodology for the Eighth Critical Assessment of Techniques for Protein Structure Prediction. Aggressive sampling and all-atom refinement were carried out for nearly all targets. A combination of alignment methodologies was used to generate starting models from a range of templates, and the models were then subjected to Rosetta all atom refinement. For 50 targets with readily identified templates, the best submitted model was better than the best alignment to the best template in the Protein Data Bank for 24 domains, and improved over the best starting model for 43 domains. For 13 targets where only very distant sequence relationships to proteins of known structure were detected, models were generated using the Rosetta de novo structure prediction methodology followed by all-atom refinement; in several cases the submitted models were better than those based on the available templates. Of the 12 refinement challenges, the best submitted model improved on the starting model in 7 cases. These improvements over the starting template-based models and refinement tests demonstrate the power of Rosetta structure refinement in improving model accuracy.
Foldit is a multiplayer online game in which players collaborate and compete to create accurate protein structure models. For specific hard problems, Foldit player solutions can in some cases outperform state-of-the-art computational methods. However, very little is known about how collaborative gameplay produces these results and whether Foldit player strategies can be formalized and structured so that they can be used by computers. To determine whether high performing player strategies could be collectively codified, we augmented the Foldit gameplay mechanics with tools for players to encode their folding strategies as "recipes" and to share their recipes with other players, who are able to further modify and redistribute them. Here we describe the rapid social evolution of player-developed folding algorithms that took place in the year following the introduction of these tools. Players developed over 5,400 different recipes, both by creating new algorithms and by modifying and recombining successful recipes developed by other players. The most successful recipes rapidly spread through the Foldit player population, and two of the recipes became particularly dominant. Examination of the algorithms encoded in these two recipes revealed a striking similarity to an unpublished algorithm developed by scientists over the same period. Benchmark calculations show that the new algorithm independently discovered by scientists and by Foldit players outperforms previously published methods. Thus, online scientific game frameworks have the potential not only to solve hard scientific problems, but also to discover and formalize effective new strategies and algorithms.citizen science | crowd-sourcing | optimization | structure prediction | strategy C itizen science is an approach to leveraging natural human abilities for scientific purposes. Most such efforts involve visual tasks such as tagging images or locating image features (1-3). In contrast, Foldit is a multiplayer online scientific discovery game, in which players become highly skilled at creating accurate protein structure models through extended game play (4, 5). Foldit recruits online gamers to optimize the computed Rosetta energy using human spatial problem-solving skills. Players manipulate protein structures with a palette of interactive tools and manipulations. Through their interactive exploration Foldit players also utilize user-friendly versions of algorithms from the Rosetta structure prediction methodology (6) such as wiggle (gradient-based energy minimization) and shake (combinatorial side chain rotamer packing). The potential of gamers to solve more complex scientific problems was recently highlighted by the solution of a long-standing protein structure determination problem by Foldit players (7).One of the key strengths of game-based human problem exploration is the human ability to search over the space of possible strategies and adapt those strategies to the type of problem and stage of problem solving (5). The variability of tactics and strategies s...
A key issue in macromolecular structure modeling is the granularity of the molecular representation. A fine-grained representation can approximate the actual structure more accurately, but may require many more degrees of freedom than a coarse-grained representation and hence make conformational search more challenging. We investigate this tradeoff between the accuracy and the size of protein conformational search space for two frequently used representations: one with fixed bond angles and lengths and one that has full flexibility. We performed large-scale explorations of the energy landscapes of 82 protein domains under each model, and find that the introduction of bond angle flexibility significantly increases the average energy gap between native and non-native structures. We also find that incorporating bonded geometry flexibility improves low resolution X-ray crystallographic refinement. These results suggest that backbone bond angle relaxation makes an important contribution to native structure energetics, that current energy functions are sufficiently accurate to capture the energetic gain associated with subtle deformations from chain ideality, and more speculatively, that backbone geometry distortions occur late in protein folding to optimize packing in the native state.
What conformations do protein molecules populate in solution? Crystallography provides a highresolution description of protein structure in the crystal environment, while NMR describes structure in solution but using less data. NMR structures display more variability, but is this because crystal contacts are absent or because of fewer data constraints? Here we report unexpected insight into this issue obtained through analysis of detailed protein energy landscapes generated by large-scale, native-enhanced sampling of conformational space with Rosetta@HOME for 111 protein domains. In the absence of tightly associating binding partners or ligands, the lowest-energy Rosetta models were nearly all <2.5Å C α RMSD from the experimental structure; this result demonstrates that structure prediction accuracy for globular proteins is limited mainly by the ability to sample close to the native structure. While the lowest-energy models are similar to deposited structures, they are not identical; the largest deviations are most often in regions involved in ligand, quaternary, or crystal contacts. For ligand binding proteins, the low energy models may resemble the apo structures, and for oligomeric proteins, the monomeric assembly intermediates. The deviations between the low energy models and crystal structures largely disappear when landscapes are computed in the context of the crystal lattice or multimer. The computed low-energy ensembles, with tight crystal-structure-like packing in the core, but more NMR-structure-like variability in loops, may in some cases resemble the native state ensembles of proteins better than individual crystal or NMR structures, and can suggest experimentally testable hypotheses relating alternative states and structural heterogeneity to function.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.