2020
DOI: 10.1101/2020.03.27.011767
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Accuracy and efficiency of germline variant calling pipelines for human genome data

Abstract: Advances in next-generation sequencing technology has enabled whole genome sequencing (WGS) to be widely used for identification of causal variants in a spectrum of genetic-related disorders, and provided new insight into how genetic polymorphisms affect disease phenotypes. The development of different bioinformatics pipelines has continuously improved the variant analysis of WGS data, however there is a necessity for a systematic performance comparison of these pipelines to provide guidance on the application… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(18 citation statements)
references
References 35 publications
0
18
0
Order By: Relevance
“…Variants were called across all samples in a single batch with GATK 3.8 using the -newQual flag to minimize false negative singleton calls. The recall rate for GATK against truth sets is between 93 and 99% for single nucleotide variants and 85 and 98% for small (less than 50 bp) indel events [19]. Genome annotation was performed using SnpEff 4.3 [20] after splitting multi-allelic sites with Vt [21].…”
Section: Genome Sequencingmentioning
confidence: 99%
“…Variants were called across all samples in a single batch with GATK 3.8 using the -newQual flag to minimize false negative singleton calls. The recall rate for GATK against truth sets is between 93 and 99% for single nucleotide variants and 85 and 98% for small (less than 50 bp) indel events [19]. Genome annotation was performed using SnpEff 4.3 [20] after splitting multi-allelic sites with Vt [21].…”
Section: Genome Sequencingmentioning
confidence: 99%
“…In recent decades, the technologies used for detecting germline and somatic mutations have been greatly improved. The F1-scores of germline variant calling have exceeded 0.99 17 . Clonal somatic mutation calling (e.g., cancer) has gained credence on a clinical level, by lowering the limit of detection down to around 1% 24 .…”
Section: Discussionmentioning
confidence: 99%
“…This ambiguity is reflected in the disparate set of approaches applied in recent studies, such as targeting variants with unlikely VAFs for normal zygosity in a single sample 7,8 , searching for shared variants in a pair of samples 9 , and machine-learning algorithms 10,11 . These circumstances urgently demand a rigorous cataloging and assessment of mosaic detection algorithms, as conducted for germline and somatic variants [12][13][14][15][16][17] , but should be in a more sophisticated manner to cover the full extent of scenarios that mosaic variants can represent. Above all, the construction of robust and biologically compatible reference standards is a prerequisite.…”
mentioning
confidence: 99%
“…Deduped BAM les were rstly processed by "QualCal" tool to conduct base quality score recalibration, and variants were called by "Haplotyper" tool to provide the matching result of GATK. VQSR was not performed because we don't believe this extra step will improve overall variant calling accuracy [31].…”
Section: Running Dnaseq (Gatk Re-implementation) and Dnascopementioning
confidence: 99%