2014
DOI: 10.1093/bioinformatics/btu345
|View full text |Cite
|
Sign up to set email alerts
|

SMaSH: a benchmarking toolkit for human genome variant calling

Abstract: We provide free and open access online to the SMaSH tool kit, along with detailed documentation, at smash.cs.berkeley.edu

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
37
0

Year Published

2014
2014
2019
2019

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 42 publications
(37 citation statements)
references
References 35 publications
0
37
0
Order By: Relevance
“…A variety of approaches have been recently developed to address the challenges in variant representation. [9][10][11]21,22 Real Time Genomics (RTG) developed the comparison tool vcfeval, which introduced the idea of comparing variants at the level of the genomic haplotypes that the variants represent as a way to overcome the problems associated with comparing complex variants, where alternative yet equivalent variant representations can confound direct comparison methods. 9 Variant "normalization" tools help to represent variants in a standardized way (e.g., by left-shifting indels in repeats), but they demonstrated that "variant normalization" approaches alone were not able to reconcile different representations of many complex variants.…”
Section: Variant Representationmentioning
confidence: 99%
See 1 more Smart Citation
“…A variety of approaches have been recently developed to address the challenges in variant representation. [9][10][11]21,22 Real Time Genomics (RTG) developed the comparison tool vcfeval, which introduced the idea of comparing variants at the level of the genomic haplotypes that the variants represent as a way to overcome the problems associated with comparing complex variants, where alternative yet equivalent variant representations can confound direct comparison methods. 9 Variant "normalization" tools help to represent variants in a standardized way (e.g., by left-shifting indels in repeats), but they demonstrated that "variant normalization" approaches alone were not able to reconcile different representations of many complex variants.…”
Section: Variant Representationmentioning
confidence: 99%
“…First, benchmarking must consider that variants may be represented in multiple ways in the commonly used variant call format (VCF). [9][10][11][12] When comparing VCF files record by record, many of the putative differences are simply different representations of the same variant. Secondly, definitions for performance metrics such as true positive (TP), false positive (FP), and false negative (FN), which are key for the interpretation of the benchmarking results, are not yet standardized.…”
Section: Introductionmentioning
confidence: 99%
“…Second, although simulated data are widely used for their easy access, low cost, and clear constitution of positives and negatives, several common artifacts are beyond current simulation yet, such as: the non-random distribution of variants, incomplete reference genome, and copy number variations (CNVs)24. Since simulation datasets are collections of synthetic reads based on simple generative models while real datasets are much more complex and harder to call variation on, they may not truly tell the same story as real sequencing data25. Third, as for the use of indirect properties of mutation calls instead of direct validation in these studies, multiple metrics of the prediction sets must be weighed to estimate the performance of each program rather than simply counting overlaps between different methods or calculating the average read depths.…”
mentioning
confidence: 99%
“…To evaluate the performance of these algorithms we used SMaSH [20], a recently developed suite of tools for benchmarking variant calling algorithms. Briefly, SMaSH is motivated by the lack of a gold-standard NGS benchmarking dataset which both a) mimics a realistic use-case (i.e.…”
Section: Datasets and Evaluationmentioning
confidence: 99%
“…We used the same window size for AllChange. For CAGe++, we set the variant calling parameters to (α 1 , α 2 , α 3 , α 4 , α 5 ) = (12,10,20,3,20).…”
mentioning
confidence: 99%