The goal of Dynameomics is to perform atomistic molecular dynamics (MD) simulations of representative proteins from all known folds in explicit water in their native state and along their thermal unfolding pathways. Here we present 188-fold representatives and their native state simulations and analyses. These 188 targets represent 67% of all the structures in the Protein Data Bank. The behavior of several specific targets is highlighted to illustrate general properties in the full dataset and to demonstrate the role of MD in understanding protein function and stability. As an example of what can be learned from mining the Dynameomics database, we identified a protein fold with heightened localized dynamics. In one member of this fold family, the motion affects the exposure of its phosphorylation site and acts as an entropy sink to offset another portion of the protein that is relatively immobile in order to present a consistent interface for protein docking. In another member of this family, a polymorphism in the highly mobile region leads to a host of disease phenotypes. We have constructed a web site to provide access to a novel hybrid relational/multidimensional database (described in the succeeding two papers) to view and interrogate simulations of the top 30 targets: http://www.dynameomics.org. The Dynameomics database, currently the largest collection of protein simulations and protein structures in the world, should also be useful for determining the rules governing protein folding and kinetic stability, which should aid in deciphering genomic information and for protein engineering and design.
Summary The dynamic behavior of proteins is important for an understanding of their function and folding. We have performed molecular dynamics simulations of the native state and unfolding pathways of over 1000 proteins, representing the majority of folds in globular proteins. These data are stored and organized using an innovative database approach, which can be mined to obtain both general and specific information about the dynamics and folding/unfolding of proteins, relevant subsets thereof, and individual proteins. Here we describe the project in general terms and the type of information contained in the database. Then we provide examples of mining the database for information relevant to protein folding, structure building, the effect of single-nucleotide polymorphisms, and drug design. The native state simulation data and corresponding analyses for the 100 most populated metafolds, together with related resources, are publicly accessible through www.dynameomics.org.
Dynameomics is a project to investigate and catalog the native-state dynamics and thermal unfolding pathways of representatives of all protein folds using solvated molecular dynamics simulations, as described in the preceding paper. Here we introduce the design of the molecular dynamics data warehouse, a scalable, reliable repository that houses simulation data that vastly simplifies management and access. In the succeeding paper, we describe the development of a complementary multidimensional database. A single protein unfolding or native-state simulation can take weeks to months to complete, and produces gigabytes of coordinate and analysis data. Mining information from over 3000 completed simulations is complicated and time-consuming. Even the simplest queries involve writing intricate programs that must be built from low-level file system access primitives and include significant logic to correctly locate and parse data of interest. As a result, programs to answer questions that require data from hundreds of simulations are very difficult to write. Thus, organization and access to simulation data have been major obstacles to the discovery of new knowledge in the Dynameomics project. This repository is used internally and is the foundation of the Dynameomics portal site http://www.dynameomics.org. By organizing simulation data into a scalable, manageable and accessible form, we can begin to address substantial questions that move us closer to solving biomedical and bioengineering problems.
The folding pathway of the small α/β protein GB1 has been extensively studied during the past two decades using both theoretical and experimental approaches. These studies provided a consensus view that the protein folds in a two-state manner. Here, we reassessed the folding of GB1, both by experiments and simulations, and detected the presence of an on-pathway intermediate. This intermediate has eluded earlier experimental characterization and is distinct from the collapsed state previously identified using ultrarapid mixing. Failure to identify the presence of an intermediate affects some of the conclusions that have been drawn for GB1, a popular model for protein folding studies.
The goal of the Dynameomics project is to perform, store, and analyze molecular dynamics simulations of representative proteins, of all known globular folds, in their native state and along their unfolding pathways. To analyze unfolding simulations, the location of the protein along the unfolding reaction coordinate (RXN) must be determined. Properties such as the fraction of native contacts and radius of gyration are often used; however, there is an issue regarding degeneracy with these properties, as native and nonnative species can overlap. Here, we used 15 physical properties of the protein to construct a multidimensional-embedded, one-dimensional RXN coordinate that faithfully captures the complex nature of unfolding. The unfolding RXN coordinates for 188 proteins (1534 simulations and 22.9 mus in explicit water) were calculated. Native, transition, intermediate, and denatured states were readily identified with the use of this RXN coordinate. A global native ensemble based on the native-state properties of the 188 proteins was created. This ensemble was shown to be effective for calculating RXN coordinates for folds outside the initial 188 targets. These RXN coordinates enable, high-throughput assignment of conformational states, which represents an important step in comparing protein properties across fold space as well as characterizing the unfolding of individual proteins.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.