The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.
The perpetual arms race between bacteria and phage has resulted in the evolution of efficient resistance systems that protect bacteria from phage infection. Such systems, which include the CRISPR-Cas and restriction-modification systems, have proven to be invaluable in the biotechnology and dairy industries. Here, we report on a six-gene cassette in Bacillus cereus which, when integrated into the Bacillus subtilis genome, confers resistance to a broad range of phages, including both virulent and temperate ones. This cassette includes a putative Lon-like protease, an alkaline phosphatase domain protein, a putative RNAbinding protein, a DNA methylase, an ATPase-domain protein, and a protein of unknown function. We denote this novel defense system BREX (Bacteriophage Exclusion) and show that it allows phage adsorption but blocks phage DNA replication. Furthermore, our results suggest that methylation on non-palindromic TAGGAG motifs in the bacterial genome guides self/non-self discrimination and is essential for the defensive function of the BREX system. However, unlike restriction-modification systems, phage DNA does not appear to be cleaved or degraded by BREX, suggesting a novel mechanism of defense. Pan genomic analysis revealed that BREX and BREX-like systems, including the distantly related Pgl system described in Streptomyces coelicolor, are widely distributed in~10% of all sequenced microbial genomes and can be divided into six coherent subtypes in which the gene composition and order is conserved. Finally, we detected a phage family that evades the BREX defense, implying that anti-BREX mechanisms may have evolved in some phages as part of their arms race with bacteria.
Recombination between moderately divergent DNA sequences is impaired compared with identical sequences. In yeast, an HO endonuclease-induced double-strand break can be repaired by single-strand annealing (SSA) between flanking homologous sequences. A 3% sequence divergence between 205-bp sequences flanking the double-strand break caused a 6-fold reduction in repair compared with identical sequences. This reduction in heteroduplex rejection was suppressed in a mismatch repair-defective msh6⌬ strain and partially suppressed in an msh2 separation-offunction mutant. In mlh1⌬ strains, heteroduplex rejection was greater than in msh6⌬ strains but less than in wild type. Deleting PMS1, MLH2, or MLH3 had no effect on heteroduplex rejection, but a pms1⌬ mlh2⌬ mlh3⌬ triple mutant resembled mlh1⌬. However, correction of the mismatches within heteroduplex SSA intermediates required PMS1 and MLH1 to the same extent as MSH2 and MSH6. An SSA competition assay in which either diverged or identical repeats can be used for repair showed that heteroduplex DNA is likely to be unwound rather than degraded. This conclusion is supported by the finding that deleting the SGS1 helicase also suppressed heteroduplex rejection.G enetic recombination depends on the efficient and accurate search for homology between recipient and donor DNA substrates. Studies in both prokaryotes and eukaryotes have shown that mismatch repair proteins play a critical role in regulating this homology search during strand invasion (1, 2). A role for mismatch repair proteins in regulating recombination was first obtained in transformation studies performed in Pneumococcus. A small number of base-base differences between donor and recipient molecules significantly decreased the formation of stable transformants. This decrease, known as heteroduplex rejection, was suppressed by mutations in hexA and hexB, homologs of the Escherichia coli mismatch repair proteins MutS and MutL, respectively (3,4). The MutS and MutL proteins play key roles in the repair of base pair mismatches; MutS binds to mispairs and MutL appears to interact with MutS-mispair complexes to initiate downstream mismatch repair steps (5-8). Subsequent studies in bacteria, yeast, and humans showed that mismatch repair plays a critical role in repressing recombination between moderately divergent (homeologous) sequences (9-12).In Saccharomyces cerevisiae repair of mismatches arising during DNA replication or through heteroduplex DNA formation during recombination depends on the activity of several MutS and MutL homologs. Msh2p, Msh3p, Msh6p, and two MutL homologs, Mlh1p and Pms1p, have been shown to play major roles in mismatch repair, whereas two other MutL homologs, Mlh2p and Mlh3p, play specialized roles (13-19). These proteins appear to function as heterodimers in mismatch repair, because Msh2p-Msh3p, Msh2p-Msh6p, Mlh1p-Pms1p, Mlh1p-Mlh3p, and Mlh1p-Mlh2p complexes have been identified (20). Furthermore, the Msh2p-Msh6p complex shows a strong selectivity for base pair substitutions, whereas Msh2p-Ms...
Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref. 3) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.