The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
Sequences and available structures were compared for all the widely distributed representatives of the P-loop GTPases and GTPase-related proteins with the aim of constructing an evolutionary classification for this superclass of proteins and reconstructing the principal events in their evolution. The GTPase superclass can be divided into two large classes, each of which has a unique set of sequence and structural signatures (synapomorphies). The first class, designated TRAFAC (after translation factors) includes enzymes involved in translation (initiation, elongation, and release factors), signal transduction (in particular, the extended Ras-like family), cell motility, and intracellular transport. The second class, designated SIMIBI (after signal recognition particle, MinD, and BioD), consists of signal recognition particle (SRP) GTPases, the assemblage of MinD-like ATPases, which are involved in protein localization, chromosome partitioning, and membrane transport, and a group of metabolic enzymes with kinase or related phosphate transferase activity. These two classes together contain over 20 distinct families that are further subdivided into 57 subfamilies (ancient lineages) on the basis of conserved sequence motifs, shared structural features, and domain architectures. Ten subfamilies show a universal phyletic distribution compatible with presence in the last universal common ancestor of the extant life forms (LUCA). These include four translation factors, two OBG-like GTPases, the YawG/YlqF-like GTPases (these two subfamilies also consist of predicted translation factors), the two signal-recognition-associated GTPases, and the MRP subfamily of MinD-like ATPases. The distribution of nucleotide specificity among the proteins of the GTPase superclass indicates that the common ancestor of the entire superclass was a GTPase and that a secondary switch to ATPase activity has occurred on several independent occasions during evolution. The functions of most GTPases that are traceable to LUCA are associated with translation. However, in contrast to other superclasses of P-loop NTPases (RecA-F1/F0, AAA+, helicases, ABC), GTPases do not participate in NTP-dependent nucleic acid unwinding and reorganizing activities. Hence, we hypothesize that the ancestral GTPase was an enzyme with a generic regulatory role in translation, with subsequent diversification resulting in acquisition of diverse functions in transport, protein trafficking, and signaling. In addition to the classification of previously known families of GTPases and related ATPases, we introduce several previously undetected families and describe new functional predictions.
PSI-BLAST is an iterative program to search a database for proteins with distant similarity to a query sequence. We investigated over a dozen modifications to the methods used in PSI-BLAST, with the goal of improving accuracy in finding true positive matches. To evaluate performance we used a set of 103 queries for which the true positives in yeast had been annotated by human experts, and a popular measure of retrieval accuracy (ROC) that can be normalized to take on values between 0 (worst) and 1 (best). The modifications we consider novel improve the ROC score from 0.758 +/- 0.005 to 0.895 +/- 0.003. This does not include the benefits from four modifications we included in the 'baseline' version, even though they were not implemented in PSI-BLAST version 2.0. The improvement in accuracy was confirmed on a small second test set. This test involved analyzing three protein families with curated lists of true positives from the non-redundant protein database. The modification that accounts for the majority of the improvement is the use, for each database sequence, of a position-specific scoring system tuned to that sequence's amino acid composition. The use of composition-based statistics is particularly beneficial for large-scale automated applications of PSI-BLAST.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.