2012
DOI: 10.1002/humu.22033
|View full text |Cite
|
Sign up to set email alerts
|

Detecting false-positive signals in exome sequencing

Abstract: Disease gene discovery has been transformed by affordable sequencing of exomes and genomes. Identification of disease-causing mutations requires sifting through a large number of sequence variants. A subset of the variants are unlikely to be good candidates for disease causation based on one or more of the following criteria: (1) being located in genomic regions known to be highly polymorphic, (2) having characteristics suggesting assembly misalignment, and/or (3) being labeled as variants based on misleading … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
102
0
2

Year Published

2012
2012
2017
2017

Publication Types

Select...
8

Relationship

1
7

Authors

Journals

citations
Cited by 147 publications
(105 citation statements)
references
References 49 publications
(56 reference statements)
1
102
0
2
Order By: Relevance
“…Given the large degree of variability between any two genomes, we are approaching a paradigm where short-read, reference-based approaches are no longer the sole gold standard for variant analysis, both for exome and genome sequencing 50 . This study provides a framework for integrating multiple platforms: high-quality short reads for SNVs and indels, long reads for structural variation, and long-read assembly and genome maps for large-scale genome rearrangements.…”
Section: Discussionmentioning
confidence: 99%
“…Given the large degree of variability between any two genomes, we are approaching a paradigm where short-read, reference-based approaches are no longer the sole gold standard for variant analysis, both for exome and genome sequencing 50 . This study provides a framework for integrating multiple platforms: high-quality short reads for SNVs and indels, long reads for structural variation, and long-read assembly and genome maps for large-scale genome rearrangements.…”
Section: Discussionmentioning
confidence: 99%
“…If more than one mutation is found in a sample for a gene, then the mutation of the higher priority functional class was used for visualization. SNVs were filtered using tabixpp (3b299cc0911debadc435fdae60bbb72bd10f6d84), removing SNVs found in any of the following databases: dbSNP141 (modified to remove somatic and clinical variants, with variants with the following flags excluded: SAO=2/3, PM, CDA, TPA, MUT and OM)40, 1,000 Genomes Project (v3), Complete Genomics 69 whole genomes, duplicate gene database (v68)41, ENCODE DAC and Duke Mapability Consensus Excludable databases (comprising poorly mapping reads, repeat regions, and mitochondrial and ribosomal DNA)42, invalidated somatic SNVs from 68 human colorectal cancer exomes (unpublished data) using the AccuSNP platform (Roche NimbleGen), germline SNVs from 477 sporadic PCa patients with the intermediate GS (3+X and X+3) and additional 10 prostate cancer patients with higher GS13, and the Fuentes database of likely false positive variants43. SNVs were whitelisted (and retained, independently of the presence in other filters) if they were contained within the Catalogue of Somatic Mutations in Cancer (COSMIC) database (v70)44.…”
Section: Methodsmentioning
confidence: 99%
“…For instance, read depth of a specific region in one platform could be too low to reliably call variants. Also, platform-concordant variants could be false positives due to the same systematic bias of different WGS platforms (Fuentes Fajardo, et al, 2012; Lam, et al, 2012; Ross, et al, 2013). Those phenomena should have influenced training of the LR filter and validation of the proposed filtering methods in variant prioritization.…”
Section: Discussionmentioning
confidence: 99%
“…Discovering disease-associated variants such as known Mendelian disease-causing and loss of function (LoF) variants or de novo mutations (DNMs) using next-generation sequencing (NGS) requires accuracy and precision in identifying genomic variants as well as sufficient coverage for the sequenceable human genome (Gargis, et al, 2012); however, many sources of false positives and false negatives have been identified. The comparison of sequencing platforms and library preparation methods showed significant bias (Fuentes Fajardo, et al, 2012; Lam, et al, 2012; Ross, et al, 2013), and alignment and variant calling procedures result in false positives and false negatives as well (Bao, et al, 2011; O'Rawe, et al, 2013; Pabinger, et al, 2013; Yu, et al, 2012). The differences due to sequencing platforms, alignment methods, and variant calling procedures are more significant for INDELs compared to SNVs (Lam, et al, 2012; O'Rawe, et al, 2013; Zook, et al, 2014).…”
Section: Introductionmentioning
confidence: 99%