The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.
The 2.1-Å resolution crystal structure of wild-type green f luorescent protein and comparison of it with the recently determined structure of the Ser-65 3 Thr (S65T) mutant explains the dual wavelength absorption and photoisomerization properties of the wild-type protein. The two absorption maxima are caused by a change in the ionization state of the chromophore. The equilibrium between these states appears to be governed by a hydrogen bond network that permits proton transfer between the chromophore and neighboring side chains. The predominant neutral form of the f luorophore maximally absorbs at 395 nm. It is maintained by the carboxylate of Glu-222 through electrostatic repulsion and hydrogen bonding via a bound water molecule and Ser-205. The ionized form of the f luorophore, absorbing at 475 nm, is present in a minor fraction of the native protein. Glu-222 donates its charge to the f luorophore by proton abstraction through a hydrogen bond network, involving Ser-205 and bound water. Further stabilization of the ionized state of the f luorophore occurs through a rearrangement of the side chains of Thr-203 and His-148. UV irradiation shifts the ratio of the two absorption maxima by pumping a proton relay from the neutral chromophore's excited state to Glu-222. Loss of the Ser-205-Glu-222 hydrogen bond and isomerization of neutral Glu-222 explains the slow return to the equilibrium dark-adapted state of the chromophore. In the S65T structure, steric hindrance by the extra methyl group stabilizes a hydrogen bonding network, which prevents ionization of Glu-222. Therefore the f luorophore is permanently ionized, causing only a 489-nm excitation peak. This new understanding of proton redistribution in green f luorescent protein should enable engineering of environmentally sensitive f luorescent indicators and UV-triggered f luorescent markers of protein diffusion and trafficking in living cells.The green fluorescent protein (GFP) from the jellyfish Aequorea victoria is the first known protein in which visible fluorescence is genetically encodable. The fluorophore is derived from natural residues present within the primary structure of GFP, so no exogenous cofactor or substrate is needed for fluorescence (1, 2). The tremendous potential of GFP as a reporter of gene expression, cell lineage, and protein trafficking and interactions has been extensively reviewed (3-5).Wild-type (WT) GFP is a 238-aa protein (2). In vitro GFP is a particularly stable protease-resistant protein (6) and is only denatured under extreme conditions (7). The GFP chromophore, p-hydroxybenzylideneimidazolinone (8, 9), is formed by internal cyclization of a Ser-Tyr-Gly tripeptide and 1,2-dehydrogenation of the Tyr. This posttranslational modification is oxygendependent, requiring Ϸ2-4 h for the WT protein (10, 11). A mechanism for the fluorophore formation has been proposed (3) but needs to be confirmed by further studies.GFP absorbs blue light at 395 nm, with a smaller peak at 475 nm, and emits green light at 508 nm with a quantum yie...
We report here genome sequences and comparative analyses of three closely related parasitoid wasps: Nasonia vitripennis, N. giraulti, and N. longicornis. Parasitoids are important regulators of arthropod populations, including major agricultural pests and disease vectors, and Nasonia is an emerging genetic model, particularly for evolutionary and developmental genetics. Key findings include the identification of a functional DNA methylation tool kit; hymenopteran-specific genes including diverse venoms; lateral gene transfers among Pox viruses, Wolbachia, and Nasonia; and the rapid evolution of genes involved in nuclear-mitochondrial interactions that are implicated in speciation. Newly developed genome resources advance Nasonia for genetic research, accelerate mapping and cloning of quantitative trait loci, and will ultimately provide tools and knowledge for further increasing the utility of parasitoids as pest insect-control agents.
The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://www.ncbi.nlm.nih.gov/refseq/). We report here on growth of the mammalian and human subsets, changes to NCBI’s eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI’s eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.