2015
DOI: 10.1093/bioinformatics/btv204
|View full text |Cite
|
Sign up to set email alerts
|

MetaSV: an accurate and integrative structural-variant caller for next generation sequencing

Abstract: Summary: Structural variations (SVs) are large genomic rearrangements that vary significantly in size, making them challenging to detect with the relatively short reads from next-generation sequencing (NGS). Different SV detection methods have been developed; however, each is limited to specific kinds of SVs with varying accuracy and resolution. Previous works have attempted to combine different methods, but they still suffer from poor accuracy particularly for insertions. We propose MetaSV, an integrated SV c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
123
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
6
1
1
1

Relationship

2
7

Authors

Journals

citations
Cited by 143 publications
(124 citation statements)
references
References 16 publications
1
123
0
Order By: Relevance
“…We identified a total of 6,123 small somatic variants, including 5,981 SNVs and 142 indels (Supplementary Table 4). To maximise for the detection of larger deletions, insertions / duplications and inversions (> 50 bp), we used five separate SV calling tools, specifically Breakdancer (read-pair) [18], Pindel (split-read) [19], CNVnator (read-depth) [20], as well as read-pair and split-read integration tools, Manta [21] and Lumpy [22], collating our findings using MetaSV [6] which required an SV to be detected by at least four reads and two NGS-based SV callers [reviewed in 23]. Due to limitations of short-read NGS data for detecting SVs in high repeat regions [24], we performed post-call filtering to remove low complexity regions, followed by manual inspection (Supplementary Table 5), identifying 45 deletions and four duplications under 1 Kb (Supplementary Table 6), and 26 deletions, seven duplications and a single inversion greater than 1 Kb in length (Supplementary Table 7).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We identified a total of 6,123 small somatic variants, including 5,981 SNVs and 142 indels (Supplementary Table 4). To maximise for the detection of larger deletions, insertions / duplications and inversions (> 50 bp), we used five separate SV calling tools, specifically Breakdancer (read-pair) [18], Pindel (split-read) [19], CNVnator (read-depth) [20], as well as read-pair and split-read integration tools, Manta [21] and Lumpy [22], collating our findings using MetaSV [6] which required an SV to be detected by at least four reads and two NGS-based SV callers [reviewed in 23]. Due to limitations of short-read NGS data for detecting SVs in high repeat regions [24], we performed post-call filtering to remove low complexity regions, followed by manual inspection (Supplementary Table 5), identifying 45 deletions and four duplications under 1 Kb (Supplementary Table 6), and 26 deletions, seven duplications and a single inversion greater than 1 Kb in length (Supplementary Table 7).…”
Section: Resultsmentioning
confidence: 99%
“…As no single informatics tool can detect the full range of SVs regarding size and subtype [5], integrated methods have been proposed [6, 7], with de novo assembly of tumor genomes remaining a challenge. While long-read (up to thousands of bases) sequencing methods, such as single-molecule sequencing from Pacific Biosystems (PacBio) and Oxford Nanopore, are improving SV detection [8, 9], they are still limited by relatively high costs, low throughput and relatively high error rates.…”
Section: Introductionmentioning
confidence: 99%
“…Several factors complicate the analysis, in particular mappability issues due to repetitive sequence regions (15). Indeed, it has become clear that the results produced by different methods are not consistent, and some studies have intersected multiple approaches to provide a presumed high-confidence set of predictions (16,17). Adding to the challenges is the difficulty of assessing performance: True positive sets have thus far been obtained through simulated genomic sequences (18), but this will not reflect the true complexity of cancer genomes.…”
mentioning
confidence: 99%
“…Specifically, the Bina's in-memory sorter was used concurrently with alignment to minimize latency; BWA-MEM (Li 2013) v0.7.5a was used for sequence alignment; GATK (DePristo et al 2011) v2.8 with HaplotyperCaller and VQSR was used for SNV and small indel detection and filtering; and MetaSV (Mohiyuddin et al 2015) was used to integrate different SV/CNV signals detected by four orthogonal algorithms, i.e., detection of signals using read depths by CNVnator (Abyzov et al 2011), split-reads by Pindel (Ye et al 2009), paired-end reads by BreakDancer (Chen et al 2009), and junctions by BreakSeq (Lam et al 2010). The integrated call set was annotated with confidence labels (PASS/LowQual) and detection methods by MetaSV.…”
Section: Whole-genome Analysismentioning
confidence: 99%