Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast

Jeffares, Daniel C.; Jolly, Clemency; Hoti, Mimoza; Speed, Doug; Shaw, Liam; Rallis, Charalampos; Balloux, François; Dessimoz, Christophe; Bähler, Jürg; Sedlazeck, Fritz J.

doi:10.1101/047266

Cited by 137 publications

(198 citation statements)

References 67 publications

Supporting

Mentioning

194

Contrasting

Unclassified

Order By: Relevance

“…Reads were aligned to the human reference genome (Hg38) using either NGMLR 21 (for SV calling) or Minimap2 31 . Per-nucleotide coverage was determined using samtools, and clustered using the 'bincov' script of the SURVIVOR software package 32 .…”

Section: Discussionmentioning

confidence: 99%

Targeted Nanopore Sequencing with Cas9 for studies of methylation, structural variants, and mutations

Gilpatrick

Lee

Graham

et al. 2019

Preprint

View full text Add to dashboard Cite

Nanopore sequencing technology can rapidly and directly interrogate native DNA molecules. Often we are interested only in interrogating specific areas at high depth, but conventional enrichment methods have thus far proved unsuitable for long reads 1 . Existing strategies are currently limited by high input DNA requirements, low yield, short (<5kb) reads, time-intensive protocols, and/or amplification or cloning (losing base modification information). In this paper, we describe a technique utilizing the ability of Cas9 to introduce cuts at specific locations and ligating nanopore sequencing adaptors directly to those sites, a method we term 'nanopore Cas9 Targeted-Sequencing' (nCATS).We have demonstrated this using an Oxford Nanopore MinION flow cell (Capacity >10Gb+) to generate a median 165X coverage at 10 genomic loci with a median length of 18kb, representing a several hundred-fold improvement over the 2-3X coverage achieved without enrichment. We performed a pilot run on the smaller Flongle flow cell (Capacity ~1Gb), generating a median coverage of 30X at 11 genomic loci with a median length of 18kb. Using panels of guide RNAs, we show that the high coverage data from this method enables us to (1) profile DNA methylation patterns at cancer driver genes, (2) detect structural variations at known hot spots, and (3) survey for the presence of single nucleotide mutations. Together, this provides a low-cost method that can be applied even in low resource settings to directly examine cellular DNA. This technique has extensive clinical applications for assessing medically relevant genes and has the versatility to be a rapid and comprehensive diagnostic tool. We demonstrate applications of this technique by examining the well-characterized GM12878 cell line as well as three breast cell lines (MCF-10A, MCF-7, MDA-MB-231) with varying tumorigenic potential as a model for cancer. ContributionsTG and WT constructed the study. TG performed the experiments. TG, IL, and FS analyzed the data. TG, JG, ER, RB and AH and developed the method. TG and WT wrote the paper

show abstract

Section: Discussionmentioning

confidence: 99%

Targeted Nanopore Sequencing with Cas9 for studies of methylation, structural variants, and mutations

Gilpatrick

Lee

Graham

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…Assemblytics was run with default parameters (10,000 bp unique sequence anchor length) on the delta file output from nucmer. Results were transformed into VCF format using SURVIVOR 40…”

Section: Spiral Genetics Biograph Refinementmentioning

confidence: 99%

“…This highlights the importance of developing benchmark SV sets to identify which callset is correct when they disagree, and potentially when both are incorrect even when they agree. 40 shows the number of SVs overlapping between the individual SV caller and technologies split between insertions (upper left) and deletions (lower right). The diagonal highlights the overall number of SVs per SV caller.…”

Section: Candidate Sv Callsets Differ By Sequencing Technology and Anmentioning

confidence: 99%

A robust benchmark for germline structural variant detection

Zook¹,

Nf²,

Nd³

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution, and comprehensiveness. Translating these methods to routine research and clinical practice requires robust benchmark sets. We developed the first benchmark set for identification of both false negative and false positive germline SVs, which complements recent efforts emphasizing increasingly comprehensive characterization of SVs. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle (GIAB) Consortium integrated 19 sequence-resolved variant calling methods, both alignment-and de novo assembly-based, from short-, linked-, and long-read sequencing, as well as optical and electronic mapping. The final benchmark set contains 12745 isolated, sequence-resolved insertion and deletion calls ≥50 base pairs (bp) discovered by at least 2 technologies or 5 callsets, genotyped as heterozygous or homozygous variants by long reads. The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.66 Gbp and 9641 SVs supported by at least one diploid assembly. Support for SVs was assessed using svviz with short-, linked-, and long-read sequence data. In general, there was strong support from multiple technologies for the benchmark SVs, with 90 % of the Tier 1 SVs having support in reads from more than one technology. The Mendelian genotype error rate was 0.3 %, and genotype concordance with manual curation was >98.7 %. We demonstrate the utility of the benchmark set by showing it reliably identifies both false negatives and false positives in high-quality SV callsets from short-, linked-, and long-read sequencing and optical mapping. GIAB is working towards a new version of the benchmark set that will use new technologies and methods such as PacBio Circular Consensus Sequencing and ultralong Oxford Nanopore sequencing to expand to more challenging genome regions and include more challenging SVs such as inversions. We are also developing a robust integration process to make calls on GRCh37 and GRCh38 for all seven GIAB samples.

show abstract

“…We now analyze the proposed technique for computing MEMs, SMEMs and maximally spanning seeds out of minimizers considering practical circumstances. Using the human genome and the read generator Survivor [17] and DWGSIM [18], we generated reads of various sizes for subsections of the human genome. For benchmarking, we extended the MA code by integrating the Minimizer code of Minimap 2 as additional seeding module.…”

Section: Resultsmentioning

confidence: 99%

A performant bridge between fixed-size and variable-size seeding

Kutzner

Kim

Schmidt

2019

Preprint

View full text Add to dashboard Cite

Seeding is usually the initial step of high-throughput sequence aligners. Two popular seeding strategies are fixed-size seeding ( -mers, minimizers) and variable-size seeding (MEMs, SMEMs, max. spanning seeds). The former strategy benefits from fast index building and fast seed computation, while the latter one benefits from high seed entropy. Here we build a performant bridge between both strategies and show that neither of them is of theoretical superiority. We propose an algorithmic approach for computing MEMs out of -mers or minimizers. Further, we describe techniques for extracting SMEMs or maximally spanning seeds out of MEMs. A comprehensive benchmarking shows the practical value of the proposed approaches. In this context, we report about the effects and the fine-tuning of occurrence filters for the different seeding strategies. KEYWORDShigh-throughput sequence alignment, minimizer, SMEM, FMD-Index, seed entropy. INTRODUCTIONMost high-throughput read aligners [1-5] perform the following three steps: seeding [6, 7], seed processing (e.g. chaining, SoC) [8,9] and dynamic programming [10,11]. There are two techniques for seed computation: fixed-sized seeding [12] and variable-size seeding [13,14]. Fixed-size seeding is usually done via -mers or via their space efficient variant, minimizers [3]. Variable-size seeding, in turn, relies on some form of full-text search index as e.g. the FMD-index [13,14]. Fixed-size seeding benefits from short runtimes for index construction and seed computation, while variable-size seeding benefits from the high entropy of the generated seeds [2,3]. Here, we present an efficient algorithmic bridge for computing variable-size seeds out of fixed-size seeds. Hence, the performant behavior of fixed-size seeds becomes available with variable-size seeds as well.

show abstract

Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast

Cited by 137 publications

References 67 publications

Targeted Nanopore Sequencing with Cas9 for studies of methylation, structural variants, and mutations

Targeted Nanopore Sequencing with Cas9 for studies of methylation, structural variants, and mutations

A robust benchmark for germline structural variant detection

A performant bridge between fixed-size and variable-size seeding

Contact Info

Product

Resources

About