Background: The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives.
BackgroundThe human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives.ResultsWe first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz.ConclusionsWe find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-2366-2) contains supplementary material, which is available to authorized users.
Somatic mutations during stem cell division are responsible for several cancers. In principle, a similar process could occur during the intense cell proliferation accompanying human brain development, leading to the accumulation of regionally distributed foci of mutations. Using dual platform >5000-fold depth sequencing of 102 genes in 173 adult human brain samples, we detect and validate somatic mutations in 27 of 54 brains. Using a mathematical model of neurodevelopment and approximate Bayesian inference, we predict that macroscopic islands of pathologically mutated neurons are likely to be common in the general population. The detected mutation spectrum also includes DNMT3A and TET2 which are likely to have originated from blood cell lineages. Together, these findings establish developmental mutagenesis as a potential mechanism for neurodegenerative disorders, and provide a novel mechanism for the regional onset and focal pathology in sporadic cases.
Retroviruses can cause tumors when they integrate near a protooncogene or tumor suppressor gene of the host. We infected >2,500 mice with the SL3-3 murine leukemia virus; in 22 resulting tumors, we found provirus integrations nearby or within the gene that contains the mir-17-92 microRNA (miRNA) cistron. Using quantitative real-time PCR, we showed that expression of miRNA was increased in these tumors, indicating that retroviral infection can induce expression of oncogenic miRNAs. Our results demonstrate that retroviral mutagenesis can be a potent tool for miRNA discovery.oncogene ͉ retroviral mutagenesis M icroRNAs (miRNAs) are short noncoding RNAs that regulate gene expression. They are initially transcribed by RNA polymerase II and contained within hairpins on a long primary transcript. The hairpins are then processed by two successive steps mediated by a double-stranded RNA-binding protein and RNase III (in mammals, DGCR8 and Drosha followed by TRBP and Dicer), to create the mature Ϸ21-nt miRNA. The miRNA is loaded into the RNA-induced silencing complex and in animals, the complex is directed to mRNAs by the complementarity of six or seven bases within the miRNA. This leads to either translational repression or mRNA cleavage (1). It has been predicted that one-third of human genes may be regulated in this way (2).Several miRNA hairpins can be encoded as a cistron on a single primary transcript. Such is the case for the human gene, c13orf25, and its mouse homolog. Here, a primary transcript encodes (in order, 5Ј to 3Ј) used a miRNA microarray to profile human B cell tumor lines and found that miRNAs encoded by c13orf25 were overexpressed. They then showed that in mice, when B cells constitutively overexpressing c-Myc were transduced with part of the human cistron containing miRNAs 17-3p to 19b-1, lymphoma formed at an accelerated pace, suggesting that these miRNAs could be oncogenes (3). In addition, Hayashita et al. (4) found that the mir-17-92 cistron was overexpressed in human lung cancer. Also, O'Donnell et al. (5) showed that c-Myc expression leads to increased expression of miRNAs from the mir-17-92 cistron, and that mir-17-5p and mir-20 negatively regulate the cell proliferation factor E2F1, suggesting that these miRNAs could also have tumor suppressor properties.Although microarray analysis of miRNA expression in tumors has proven quite useful in identifying candidates involved in cancer and has provided seminal insight, this method inherently cannot distinguish cause from correlation and so must be corroborated by additional data, expression of a transgene or identification of implicating deletions, translocations, and other mutations (3, 6-8). Alternatively, retroviral insertional mutagenesis might be used to identify causative cancer genes. In this method, slow-transforming retroviruses, which themselves carry no oncogene, insert provirus DNA into the host DNA. Because the provirus integrates into essentially random locations in the host genome, retroviruses can be used as a gene discovery tool ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.