The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.
Adult cancers may derive from stem or early progenitor cells 1,2 . Epigenetic modulation of gene expression is essential for normal function of these early cells, but is highly abnormal in cancers, which often exhibit aberrant promoter CpG island hypermethylation and transcriptional silencing of tumor suppressor genes and pro-differentiation factors [3][4][5] . We find that, for such genes, both normal and malignant embryonic cells generally lack the gene DNA hypermethylation found in adult cancers. In embryonic stem (ES) cells, these genes are held in a "transcription ready" state mediated by a "bivalent" promoter chromatin pattern consisting of the repressive polycomb group (PcG) H3K27me mark plus the active mark, H3K4me. However, embryonic carcinoma (EC) cells add two key repressive marks, H3K9me2 and H3K9me3, both associated with DNA hypermethylated genes in adult cancers [6][7][8] . We hypothesize that cell chromatin patterns and transient silencing of these important growth regulatory genes in stem or progenitor cells of origin for cancer may leave these genes vulnerable to aberrant DNA hypermethylation and heritable gene silencing in adult tumors.Correspondence may be addressed to S.B.B. at sbaylin@jhmi.edu. Competing Interests Statement. The commercial rights to the MSP technique belong to Oncomethylome. S.B.B and J.G.H. serve as consultants to Oncomethylome and is entitled to royalties from any commercial use of this procedure. Epigenetic gene silencing and associated promoter CpG island DNA hypermethylation are prevalent in all cancer types, and provide an alternative mechanism to mutations by which tumor suppressor genes may be inactivated within a cancer cell [3][4][5] . These epigenetic changes may precede genetic changes in pre-malignant cells and foster the accumulation of additional genetic and epigenetic hits 9 . Adult cancers may derive from stem or early progenitor cells 1, 2 , and epigenetic modulation of gene expression is essential for normal function of these early cells. We now explore whether DNA hypermethylation and heritable silencing of groups of genes in adult tumor initiation and progression might reflect chromatin properties for these genes associated with a stem or precursor cell of origin. NIH Public AccessWe compared the epigenetic status of a group of genes frequently hypermethylated and silenced in adult cancers ( Fig. 1-all (Fig. 1). Among the genes studied, 13 of 29 (45%) are hypermethylated in a single line, HCT-116, of adult colon cancer, but none are hypermethylated in ES cells, and only 3% and 7% were completely methylated in the Tera-1 and Tera-2 EC lines, respectively. Thus, the key epigenetic parameter of promoter CpG island hypermethylation which is common in a large group of genes in adult cancer cells does not seem to be a common feature of EC cells.In murine ES cells, many developmental genes are maintained in a state of low transcriptional activity and are available for transcription increases or decreases when differentiation cues are received 11 . Our s...
The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://www.ncbi.nlm.nih.gov/refseq/). We report here on growth of the mammalian and human subsets, changes to NCBI’s eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI’s eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Custom implementations of the BLAST program provide sequence-based searching of many specialized datasets. New resources released in the past year include a new PubMed interface, a sequence database search and a gene orthologs page. Additional resources that were updated in the past year include PMC, Bookshelf, My Bibliography, Assembly, RefSeq, viral genomes, the prokaryotic genome annotation pipeline, Genome Workbench, dbSNP, BLAST, Primer-BLAST, IgBLAST and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.