2019
DOI: 10.1038/s41587-019-0054-x
|View full text |Cite|
|
Sign up to set email alerts
|

Best practices for benchmarking germline small-variant calls in human genomes

Abstract: Standardized benchmarking methods and tools are essential to robust accuracy assessment of NGS variant calling. Benchmarking variant calls requires careful attention to definitions of performance metrics, sophisticated comparison approaches, and stratification by variant type and genome context. To address these needs, the Global Alliance for Genomics and Health (GA4GH) Benchmarking Team convened representatives from sequencing technology developers, government agencies, academic bioinformatics researchers, cl… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
230
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 323 publications
(232 citation statements)
references
References 30 publications
2
230
0
Order By: Relevance
“…Fast and accurate variant calling is essential for both research and clinical applications of human genome sequencing 1,2 . Algorithms, best practices and benchmarking guidelines have been established for how to use Illumina sequencing to call germline small variants, including single-nucleotide polymorphisms (SNPs) and insertions/deletions (indels) 36 . In recent years, single-molecule sequencing (SMS) technologies have emerged for a variety of important applications 7 .…”
Section: Introductionmentioning
confidence: 99%
“…Fast and accurate variant calling is essential for both research and clinical applications of human genome sequencing 1,2 . Algorithms, best practices and benchmarking guidelines have been established for how to use Illumina sequencing to call germline small variants, including single-nucleotide polymorphisms (SNPs) and insertions/deletions (indels) 36 . In recent years, single-molecule sequencing (SMS) technologies have emerged for a variety of important applications 7 .…”
Section: Introductionmentioning
confidence: 99%
“…The construction of the truth set, and strengths and weaknesses based on variant type and genome context should be considered. The GiaB benchmark sets were built from the consensus of multiple variant callers on Illumina short-read sequencing with the aid of a pedigree analysis, integration of structural variants identified with long fragment technologies by PacBio and 10X Genomics, and HuRef genome analysis using Sanger sequencing 35 . Nearly all the “true” variants in NA12878 sample are present in the resource files (e.g.…”
Section: Discussionmentioning
confidence: 99%
“…However, the “synthetic-diploid” call set currently contains some errors that were intrinsically present in the long reads 27 . It is thus recommended to use a less strict benchmarking strategy (“local matches” method) for comparisons 27,35 . Here, the evaluation using “genotype match” as it applied in NA12878 datasets was performed as well (Table S4).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, this top-down approach cannot identify which variant calls are most likely when there is disagreement between the results from different sequencing protocols. In addition, at loci that have the same sequencing biases across all or most sequencing technologies, assuming that the consensus call is true can lead to errors being falsely called as real high-confidence variants (Krusche et al 2019). Another drawback to this is that clinically collected samples can vary in quality and contamination and may introduce variants with low allelic fractions not seen in reference genomes.…”
Section: Introductionmentioning
confidence: 99%