2014
DOI: 10.1101/006148
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Reducing INDEL calling errors in whole-genome and exome sequencing data

Abstract: Background: INDELs, especially those disrupting protein-coding regions of the genome, have been strongly associated with human diseases. However, there are still many errors with INDEL variant calling, driven by library preparation, sequencing biases, and algorithm artifacts. Methods: We characterized whole genome sequencing (WGS), whole exome sequencing (WES), and PCR-free sequencing data from the same samples to investigate the sources of INDEL errors. We also developed a classification scheme based on the c… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
29
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
8
2

Relationship

1
9

Authors

Journals

citations
Cited by 26 publications
(30 citation statements)
references
References 51 publications
1
29
0
Order By: Relevance
“…Future studies targeting the analysis of insertion/deletion (indel) polymorphisms may also be useful for identifying additional targets of selection. Indel polymorphisms were omitted from our analyses because their accurate genotyping remains quite challenging, especially without high coverage (≥60× 58 ) and in non-model species with incomplete, or draft genome assemblies. With regards to domestication selection, a specific, universal set of domestication genes may not exist, but we do speculate that there may be a 'core' set of genes shared across multiple domestication processes.…”
Section: Discussionmentioning
confidence: 99%
“…Future studies targeting the analysis of insertion/deletion (indel) polymorphisms may also be useful for identifying additional targets of selection. Indel polymorphisms were omitted from our analyses because their accurate genotyping remains quite challenging, especially without high coverage (≥60× 58 ) and in non-model species with incomplete, or draft genome assemblies. With regards to domestication selection, a specific, universal set of domestication genes may not exist, but we do speculate that there may be a 'core' set of genes shared across multiple domestication processes.…”
Section: Discussionmentioning
confidence: 99%
“…Whereas TPs in the reference call-set are likely to be TPs given that they are called by multiple orthogonal technologies and pipelines, using the inverse of this set to confidently identify areas of the genome that are truly non-variant may not be justified. Recent evidence has shown that alignment-based [ 42 ] and some assembly-based [ 43 ] variant-callers show high error rates for large InDels and heterozygous InDels even at WGS coverage depths up to 90×. Although higher coverage (190×) WGS datasets contribute calls to the GiBv2.18 reference, the majority of datasets are <80×.…”
Section: Discussionmentioning
confidence: 99%
“…The read depth needed can depend on multiple factors including guidelines from the scientific community, the presence of repetitive genomic regions (these are more difficult to sequence), the error rate of the sequencing platform, the algorithm used for assembling reads into a genomic sequence, and gene expression level (for RNA-seq). Read depth recommendations from the scientific literature include 100x for heterozygous single nucleotide variant detection by WES [ 19 ], 35x for genotype detection by WGS [ 20 ], 60x for detecting insertions/deletions (INDELs) by WGS [ 21 ], 10–25 million reads for differential gene expression profiling by RNA-seq [ 22 ], and 50–100 million reads for allele-specific gene expression by RNA-seq [ 23 ].…”
Section: Genomic Data Generationmentioning
confidence: 99%