2018
DOI: 10.1101/343970
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

GLnexus: joint variant calling for large cohort sequencing

Abstract: As ever-larger cohorts of human genomes are collected in pursuit of genotype/phenotype associations, sequencing informatics must scale up to yield complete and accurate genotypes from vast raw datasets. Joint variant calling, a data processing step entailing simultaneous analysis of all participants sequenced, exhibits this scaling challenge acutely. We present GLnexus (GL, Genotype Likelihood), a system for joint variant calling designed to scale up to the largest foreseeable human cohorts. GLnexus combines s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
69
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 76 publications
(70 citation statements)
references
References 25 publications
0
69
0
1
Order By: Relevance
“…As sequencing projects have grown to include hundreds of thousands of samples, the need for highly accurate variant calls and computationally efficient merging algorithms is increasingly acute. By optimizing GLnexus 18 to merge single-sample DeepVariant calls, we demonstrated that the superior accuracy 15 and generalizability across sequencing methods 36 of DeepVariant can generate more accurate cohort callsets at large scale. The callset quality metrics of the optimized pipeline consistently outperformed the GATK Best Practices across a range of cohort sizes and sequence coverages.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…As sequencing projects have grown to include hundreds of thousands of samples, the need for highly accurate variant calls and computationally efficient merging algorithms is increasingly acute. By optimizing GLnexus 18 to merge single-sample DeepVariant calls, we demonstrated that the superior accuracy 15 and generalizability across sequencing methods 36 of DeepVariant can generate more accurate cohort callsets at large scale. The callset quality metrics of the optimized pipeline consistently outperformed the GATK Best Practices across a range of cohort sizes and sequence coverages.…”
Section: Discussionmentioning
confidence: 99%
“…We adapted GLnexus 18 for merging DeepVariant gVCFs because of its computational scalability to large cohorts, access to relevant parameters, performance on allele normalization, and open-source license. To identify optimal parameters for merging DeepVariant gVCFs, we created four custom WGS cohorts of 3, 100, 333, and 1,247 samples at both high coverage (40-50x) and low coverage (15x) on chromosome 2, resulting in eight total cohorts ( Supplementary Table 1 ).…”
Section: Optimized Parameters For Joint Callingmentioning
confidence: 99%
See 2 more Smart Citations
“…Specifically, we used BWA-MEM(32) to map and align pairedend reads to the human reference genome (version GRCh38/hg38, accession GCA 000001405.15), Picard v1.93 MarkDuplicates to identify and flag PCR duplicates and GATK v4.1 (33,34) HaplotypeCaller in Reference Confidence Model mode to generate individual-level gVCF files from the aligned sequence data. We then performed joint calling of variants from all three datasets using GLnexus (35). We used the following inclusion rules to select variants for downstream analysis: AF<0.05% in the cohort, <0.01% in gnomAD exome_ALL (all ancestries), >90% target region with dp>=10; mappability=1; allele balance>=0.25; and "PASS" from DeepVariants (36).…”
Section: Es/gs Data Analysismentioning
confidence: 99%