2017
DOI: 10.1101/201178
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Scaling accurate genetic variant discovery to tens of thousands of samples

Abstract: Comprehensive disease gene discovery in both common and rare diseases will require the efficient and accurate detection of all classes of genetic variation across tens to hundreds of thousands of human samples. We describe here a novel assembly-based approach to variant calling, the GATK HaplotypeCaller (HC) and Reference Confidence Model (RCM), that determines genotype likelihoods independently per-sample but performs joint calling across all samples within a project simultaneously. We show by calling over 90… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
1,293
2
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 1,458 publications
(1,297 citation statements)
references
References 25 publications
1
1,293
2
1
Order By: Relevance
“…angsd is a package for SNP and genotype calling commonly used for whole genome data. angsd emphasizes the use of genotype likelihoods and probabilities rather than of explicit genotype calls, and is regarded as more accurate than GATK (Poplin et al, ) for the latter (Korneliussen et al, ; Maruki & Lynch, ). For SNP discovery, we found that Stacks had higher recall (Figure a,e; black vs. grey), but fractionally lower precision (Figure b,f; black vs. grey).…”
Section: Resultsmentioning
confidence: 99%
“…angsd is a package for SNP and genotype calling commonly used for whole genome data. angsd emphasizes the use of genotype likelihoods and probabilities rather than of explicit genotype calls, and is regarded as more accurate than GATK (Poplin et al, ) for the latter (Korneliussen et al, ; Maruki & Lynch, ). For SNP discovery, we found that Stacks had higher recall (Figure a,e; black vs. grey), but fractionally lower precision (Figure b,f; black vs. grey).…”
Section: Resultsmentioning
confidence: 99%
“…The resulting alignment quality and statistics were checked using the Qualimap (v2.1.3) 52 and custom R (v3.4.1) scripts. Single nucleotide variants (both SNPs and InDels) were called using the GATK HaplotypeCaller (v3.5), 53 SAMtools (v1.4), and VarScan2 (v2.3.9) 54 according to published best practices recommend by developers of the tools. GATK (v3.5) base recalibration was applied during the variant calls.…”
Section: Illumina Miseq Data Analysismentioning
confidence: 99%
“…12 Also, to eliminate most false-positive calls generated by artifacts from the sequencing process, we used the Genome Analysis Tool Kit (GATK) v3.5, including MarkDuplicate and LocalRealignment, 13 and then conducted hard filtering according to GATK recommendations. Because the NextSeq system generates base call (BCL) files aggregated by lane as raw data, we first converted the BCL files to FASTQ files using a bcl2fastq2 program provided by Illumina.…”
Section: Resultsmentioning
confidence: 99%