2019
DOI: 10.1038/s41587-019-0074-6
|View full text |Cite
|
Sign up to set email alerts
|

An open resource for accurately benchmarking small variant and reference calls

Abstract: Benchmark small variant calls are required for developing, optimizing and assessing the performance of sequencing and bioinformatics methods. Here, as part of the Genome in a Bottle Consortium (GIAB), we apply a reproducible, cloud-based pipeline to integrate multiple short and linked read sequencing datasets and provide benchmark calls for human genomes. We generate benchmark calls for one previously analyzed GIAB sample, as well as six broadly-consented genomes from the Personal Genome Project. These new gen… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

3
360
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
4

Relationship

3
6

Authors

Journals

citations
Cited by 312 publications
(377 citation statements)
references
References 29 publications
3
360
0
1
Order By: Relevance
“…To enable the community to benchmark these methods, the Genome in a Bottle Consortium (GIAB) here developed benchmark SV calls and benchmark regions for the son (HG002/NA24385) in a broadly consented and available Ashkenazi Jewish trio from the Personal Genome Project, 7 which are disseminated as National Institute of Standards and Technology (NIST) Reference Material 8392. 8,9 Many approaches have been developed to detect SVs from different sequencing technologies.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…To enable the community to benchmark these methods, the Genome in a Bottle Consortium (GIAB) here developed benchmark SV calls and benchmark regions for the son (HG002/NA24385) in a broadly consented and available Ashkenazi Jewish trio from the Personal Genome Project, 7 which are disseminated as National Institute of Standards and Technology (NIST) Reference Material 8392. 8,9 Many approaches have been developed to detect SVs from different sequencing technologies.…”
Section: Introductionmentioning
confidence: 99%
“…18,19 Finally, optical mapping and electronic mapping provide an orthogonal approach capable of determining the approximate size and location of insertions, deletions, inversions, and translocations while spanning even very large SVs. [20][21][22] GIAB recently published benchmark sets for small variants for seven genomes, 9,23 and the Global Alliance for Genomics and Health Benchmarking Team established best practices for using these and other benchmark sets to benchmark germline variants. 24 These benchmark sets are widely used in developing, optimizing, and demonstrating new technologies and bioinformatics methods, as well as part of clinical laboratory validation.…”
Section: Introductionmentioning
confidence: 99%
“…We first evaluated assembly-based SNP and small-indel (<50bp) detection by comparing Aquila's calls against the Genome in a Bottle (GiaB) benchmark callsets (Zook et al 2019). The libraries with the best assembly statistics, L3 (from NA12878) and L5 (from NA24385), achieved 97.4% and 97.8% accuracy (F1 metric) for SNPs (Table 2; Supplemental Table S2) and >93% accuracy for the high-confidence set of GiaB small indels (Table 3; Supplemental Table S3).…”
Section: Assembly-based Detection Of Snps and Small Indelsmentioning
confidence: 99%
“…Given that small variant callers today use only one type of sequencing data, and as a result consistently make erroneous calls in certain types of regions (e.g., indel calls in low-complexity regions) due to the error modes characteristic of a single sequencing technology, it is likely that the importance of variants in such regions may be less well-understood today. In addition, currently accepted benchmarks for variant calling such as Genome-In-A-Bottle (Zook, et al, 2019) have uncharacterized regions in the genome which may carry variants of significance. Some of these regions cannot be characterized due to the reliance, solely, on one type of sequencing data (namely short reads).…”
Section: Introductionmentioning
confidence: 99%