The similarity in the three-dimensional structures of homologous proteins imposes strong constraints on their sequence variability. It has long been suggested that the resulting correlations among amino acid compositions at different sequence positions can be exploited to infer spatial contacts within the tertiary protein structure. Crucial to this inference is the ability to disentangle direct and indirect correlations, as accomplished by the recently introduced direct-coupling analysis (DCA). Here we develop a computationally efficient implementation of DCA, which allows us to evaluate the accuracy of contact prediction by DCA for a large number of protein domains, based purely on sequence information. DCA is shown to yield a large number of correctly predicted contacts, recapitulating the global structure of the contact map for the majority of the protein domains examined. Furthermore, our analysis captures clear signals beyond intradomain residue contacts, arising, e.g., from alternative protein conformations, ligand-mediated residue couplings, and interdomain interactions in protein oligomers. Our findings suggest that contacts predicted by DCA can be used as a reliable guide to facilitate computational predictions of alternative protein conformations, protein complex formation, and even the de novo prediction of protein domain structures, contingent on the existence of a large number of homologous sequences which are being rapidly made available due to advances in genome sequencing.statistical sequence analysis | residue-residue covariation | contact map prediction | maximum-entropy modeling
We introduce a theoretical framework that exploits the everincreasing genomic sequence information for protein structure prediction. Structure-based models are modified to incorporate constraints by a large number of non-local contacts estimated from direct coupling analysis (DCA) of co-evolving genomic sequences. A simple hybrid method, called DCA-fold, integrating DCA contacts with an accurate knowledge of local information (e.g., the local secondary structure) is sufficient to fold proteins in the range of 1-3 Å resolution.protein folding | residue contact prediction | contact map estimation | residue-residue coevolution | statistical potentials P roteins are heteropolymers of amino acids that adopt specific 3D structures to perform designated biological tasks. Enormous experimental efforts have been invested to determine a large number of protein structures. Currently, computational structure prediction methods are reasonably successful in describing interactions among residues close (local) in sequence. Given the limited information for residues that are distant in sequence, success in large-scale structure prediction has depended crucially on known structural motifs available in protein databases. In cases where similarity to proteins of known structures exists, methods like fold recognition and homology modeling (1-3) have been shown as successful and effective, according to the Critical Assessment of Techniques for Protein Structure Prediction (4). Nevertheless, the accuracy of these methods is still in many cases far from the resolution needed to explore protein functions.Here we introduce a new computational approach that exploits information from the rapidly growing genomic sequences to complement the currently limited structural databases. Over the years, a variety of methods has been used to study co-evolution in protein sequences and estimation of residue contacts with mixed success (5-11). Recently, methods based on direct coupling analysis (DCA) (12) were shown to predict 50-300 non-local contacts to 70-80% accuracy for a variety of protein domains (13). DCA is based purely on protein sequence information. It uses covariance in homologous protein sequences as an input and deduces a direct interaction between residues (12). Those with strong direction interactions are shown to be related to structurally conserved residue-residue contacts in the protein fold (12, 13). As the contacts predicted by DCA recapitulate major features of the native contact maps, we developed a simple hybrid method integrating DCA contacts and detailed local information, to fold proteins of up to about 200 amino acids to within 3 Å of the native structures.Our methodology is guided by the energy landscape theory (14), which asserts that in a minimally frustrated, funnel-like energy landscape, native contacts are on average favorable and dominant over non-favorable, non-native ones. This drives proteins smoothly toward their native states. Folding simulations, using native contacts in structure-based models (SBM), have been...
A novel family of 2Fe-2S proteins, the NEET family, was discovered during the last decade in numerous organisms, including archea, bacteria, algae, plant and human; suggesting an evolutionary-conserved function, potentially mediated by their CDGSH Iron-Sulfur Domain. In human, three NEET members encoded by the CISD1-3 genes were identified. The structures of CISD1 (mitoNEET, mNT), CISD2 (NAF-1), and the plant At-NEET uncovered a homodimer with a unique "NEET fold", as well as two distinct domains: a beta-cap and a 2Fe-2S cluster-binding domain. The 2Fe-2S clusters of NEET proteins were found to be coordinated by a novel 3Cys:1His structure that is relatively labile compared to other 2Fe-2S proteins and is the reason of the NEETs' clusters could be transferred to apo-acceptor protein(s) or mitochondria. Positioned at the protein surface, the NEET's 2Fe-2S's coordinating His is exposed to protonation upon changes in its environment, potentially suggesting a sensing function for this residue. Studies in different model systems demonstrated a role for NAF-1 and mNT in the regulation of cellular iron, calcium and ROS homeostasis, and uncovered a key role for NEET proteins in critical processes, such as cancer cell proliferation and tumor growth, lipid and glucose homeostasis in obesity and diabetes, control of autophagy, longevity in mice, and senescence in plants. Abnormal regulation of NEET proteins was consequently found to result in multiple health conditions, and aberrant splicing of NAF-1 was found to be a causative of the neurological genetic disorder Wolfram Syndrome 2. Here we review the discovery of NEET proteins, their structural, biochemical and biophysical characterization, and their most recent structure-function analyses. We additionally highlight future avenues of research focused on NEET proteins and propose an essential role for NEETs in health and disease. This article is part of a Special Issue entitled: Fe/S proteins: Analysis, structure, function, biogenesis and diseases.
The energy landscape used by nature over evolutionary timescales to select protein sequences is essentially the same as the one that folds these sequences into functioning proteins, sometimes in microseconds. We show that genomic data, physical coarse-grained free energy functions, and family-specific information theoretic models can be combined to give consistent estimates of energy landscape characteristics of natural proteins. One such characteristic is the effective temperature T sel at which these foldable sequences have been selected in sequence space by evolution. T sel quantifies the importance of folded-state energetics and structural specificity for molecular evolution. Across all protein families studied, our estimates for T sel are well below the experimental folding temperatures, indicating that the energy landscapes of natural foldable proteins are strongly funneled toward the native state.energy landscape theory | information theory | selection temperature | funneled landscapes | elastic effects T he physics and natural history of proteins are inextricably intertwined (1, 2). The cooperative manner in which proteins find their way to a folded structure is the result of proteins having undergone natural selection and not typical of random polymers (3, 4). Likewise, the requirement that most proteins must fold to function is a strong constraint on their phylogeny. The unavoidable random mutation events that proteins have undergone throughout their evolution have provided countless numbers of physicochemical experiments on folding landscapes. Thus, the evolutionary patterns of proteins found through comparative sequence analysis can be used to understand protein structure and energetics. In this paper, we compare the information content in the correlated changes that have occurred in protein sequences of common ancestry with energies from a transferable energy function to quantify the influence of maintaining foldability on molecular evolution. Funneled Folding Landscapes from Evolution in Sequence SpaceThe key to our analysis is the principle of minimal frustration (3, 5), which states that, for quick and robust folding, the energy landscape of a protein must be dominated by interactions found in the native conformation. This native conformation is, therefore, separated by an energy gap from other compact structures that otherwise might act as kinetic traps (6, 7). These kinetic traps might appear on the folding landscape during evolution if a random mutation was to stabilize a conformation distinct from the functional one, leading to unviability. In this way, evolution and physical dynamics are coupled. A funneled, minimally frustrated landscape can be achieved if the sequence of the protein evolves to stabilize the native state while not increasing the landscape ruggedness.If folding were the only physicochemical constraint on evolution, the ensemble of naturally observed sequences would correspond to the set of sequences that has a solvent-averaged free energy for the native conformation below a ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.