Genome in a Bottle (GIAB) benchmarks have been widely used to validate clinical sequencing pipelines and develop new variant calling and sequencing methods. Here we use accurate long and linked reads to expand the prior benchmark to include difficult-to-map regions and segmental duplications that are not readily accessible to short reads. Our new benchmark adds more than 300,000 SNVs, 50,000 indels, and 16 % new exonic variants, many in challenging, clinically relevant genes not previously covered (e.g., PMS2). We increase coverage of the GRCh38 assembly from 85 % to 92 %, while excluding problematic regions for benchmarking small variants (e.g., copy number variants and assembly errors) that should not have been in the previous version. Our new benchmark reliably identifies both false positives and false negatives across multiple short-, linked-, and long-read based variant calling methods. As an example of its utility, this benchmark identifies eight times more false negatives in a short read variant call set relative to our previous benchmark, mostly in difficult-to-map regions. To enable robust small variant benchmarking, we still exclude 3.6% of GRCh37 and 5.0% of GRCh38 in (1) highly repetitive regions such as large, highly similar segmental duplications and the centromere not accessible to our data and (2) regions where our sample is highly divergent from the reference due to large indels, structural variation, copy number variation, and/or errors in the reference (e.g., some KIR genes that have duplications in HG002). We have demonstrated the utility of this benchmark to assess performance in more challenging regions, which enables benchmarking in more difficult genes and continued technology and bioinformatics development. The benchmarks are available at: ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/NISTv4.1/ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/NIST_v4.2_SmallVariantDraftBenchmark_07092020/
The rates and biological significance of somatic mutations have long been a subject of debate. Somatic mutations in plants are expected to accumulate with vegetative growth and time, yet rates of somatic mutations are unknown for conifers, which can reach exceptional sizes and ages. We investigated somatic mutation rates in the conifer Sitka spruce (Picea sitchensis (Bong.) Carr.) by analyzing a total of 276 Gb of nuclear DNA from the tops and bottoms of 20 old‐growth trees averaging 76 m in height. We estimate a somatic base substitution rate of 2.7 × 10−8 per base pair within a generation. To date, this is one of the highest estimated per‐generation rates of mutation among eukaryotes, indicating that somatic mutations contribute substantially to the total per‐generation mutation rate in conifers. Nevertheless, as the sampled trees are centuries old, the per‐year rate is low in comparison with nontree taxa. We argue that although somatic mutations raise genetic load in conifers, they generate important genetic variation and enable selection both among cell lineages within individual trees and among offspring.
Background Single cell Strand-seq is a unique tool for the discovery and phasing of genomic inversions. Conventional methods to discover inversions with Strand-seq data are blind to known inversion locations, limiting their statistical power for the detection of inversions smaller than 10 Kb. Moreover, the methods rely on manual inspection to separate false and true positives. Results Here we describe “InvertypeR”, a method based on a Bayesian binomial model that genotypes inversions using fixed genomic coordinates. We validated InvertypeR by re-genotyping inversions reported for three trios by the Human Genome Structural Variation Consortium. Although 6.3% of the family inversion genotypes in the original study showed Mendelian discordance, this was reduced to 0.5% using InvertypeR. By applying InvertypeR to published inversion coordinates and predicted inversion hotspots (n = 3701), as well as coordinates from conventional inversion discovery, we furthermore genotyped 66 inversions not previously reported for the three trios. Conclusions InvertypeR discovers, genotypes, and phases inversions without relying on manual inspection. For greater accessibility, results are presented as phased chromosome ideograms with inversions linked to Strand-seq data in the genome browser. InvertypeR increases the power of Strand-seq for studies on the role of inversions in phenotypic variation, genome instability, and human disease.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.