2013
DOI: 10.1186/gm432
|View full text |Cite
|
Sign up to set email alerts
|

Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing

Abstract: BackgroundTo facilitate the clinical implementation of genomic medicine by next-generation sequencing, it will be critically important to obtain accurate and consistent variant calls on personal genomes. Multiple software tools for variant calling are available, but it is unclear how comparable these tools are or what their relative merits in real-world scenarios might be.MethodsWe sequenced 15 exomes from four families using commercial kits (Illumina HiSeq 2000 platform and Agilent SureSelect version 2 captur… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

29
363
3
2

Year Published

2013
2013
2016
2016

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 413 publications
(403 citation statements)
references
References 57 publications
29
363
3
2
Order By: Relevance
“…Recent publications have demonstrated hundreds of thousands of differences between variant calls from different whole human genome sequencing methods or different bioinformatics methods [5][6][7][8][9][10][11] . To understand these differences, we describe a high-confidence set of genome-wide genotype calls that can be used as a benchmark.…”
Section: A N a Ly S I Smentioning
confidence: 99%
“…Recent publications have demonstrated hundreds of thousands of differences between variant calls from different whole human genome sequencing methods or different bioinformatics methods [5][6][7][8][9][10][11] . To understand these differences, we describe a high-confidence set of genome-wide genotype calls that can be used as a benchmark.…”
Section: A N a Ly S I Smentioning
confidence: 99%
“…The process of identifying mutations in NGS data can broadly be divided into three stages; generation of primary data performed by the sequencer, secondary data which includes derived DNA sequence and alignment of reads, and tertiary interpretation data, including the identification of variants and annotation. Two milestones in data analysis are the primary data (from which all results can be regenerated) and the tertiary interpreted variant files, which can be considered an end product of data analysis and are highly dependent on the steps used during data analysis (for instance, there is a low concordance between several commonly used bioinformatics pipelines for variant calling 33 ). Even the commonly used BAM file 34 does not represent primary sequence data, but is the result of aligning sequence reads to a specific reference genome.…”
Section: Do We Have An Obligation To Report and Analyse Ifs?mentioning
confidence: 99%
“…Our results also indicate that using independent library preparation replicates is an effective way to identify false positive calls [5]. Recently, O'Rawe et al showed that NGS analysis of the same data set using different variant caller pipelines often resulted in low concordance [7]. Even though restricting ones focus to only the shared variant calls from multiple data analysis pipelines may be an effective way to eliminate some false positives, this approach will not be able to eliminate certain artifacts as effectively as the triplicate approach.…”
Section: Short Communicationmentioning
confidence: 67%