2011
DOI: 10.1002/humu.21517
|View full text |Cite
|
Sign up to set email alerts
|

dbNSFP: A lightweight database of human nonsynonymous SNPs and their functional predictions

Abstract: With the advance of sequencing technologies, whole exome sequencing has increasingly been used to identify mutations that cause human diseases, especially rare Mendelian diseases. Among the analysis steps, functional prediction (of being deleterious) plays an important role in filtering or prioritizing non-synonymous SNP (NS) for further analysis. Unfortunately, different prediction algorithms use different information and each has its own strength and weakness. It has been suggested that investigators should … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
641
0
1

Year Published

2013
2013
2016
2016

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 720 publications
(657 citation statements)
references
References 30 publications
3
641
0
1
Order By: Relevance
“…Minor allele frequencies were recorded manually from dbSNP when required. Coding sequence variants were annotated with prediction of pathogenicity based on the SIFT, 19 PolyPhen-2 20 and MutationTaster 21 algorithms using the pre-calculated data provided in dbNSFP 22 (Supplementary Figure S1). …”
Section: Exome Sequencing Assembly and Variant Callingmentioning
confidence: 99%
See 1 more Smart Citation
“…Minor allele frequencies were recorded manually from dbSNP when required. Coding sequence variants were annotated with prediction of pathogenicity based on the SIFT, 19 PolyPhen-2 20 and MutationTaster 21 algorithms using the pre-calculated data provided in dbNSFP 22 (Supplementary Figure S1). …”
Section: Exome Sequencing Assembly and Variant Callingmentioning
confidence: 99%
“…We used a classification system for missense variants using: (1) missense prediction software assessing pathogenicity at the amino-acid level and (2) evolutionary conservation at the nucleotide level. 24 We used the missense prediction programs, PolyPhen-2, SIFT and MutTaster, and the different scores from these tools were derived according to the rules described by Liu et al 22 Results from the three different tools were combined to give a majority vote resulting in a single classification as 'damaging' or 'tolerated' . For the classification based on evolutionary conservation, all variants with a phylo P40.95 were considered conserved (C), otherwise non-conserved (NC).…”
Section: Variant Prioritisationmentioning
confidence: 99%
“…15 Variants were annotated by ANNOVAR 16 and filtered with NCBI dbSNP v.135, the 1000 Genomes Project catalog, and AVSIFT. 17 An average of 4.545 Gb of sequence was generated per affected individual; 99.8% of the total bases passed quality assessment and were aligned to the human reference sequence, and 80% mapped to the targeted exons with a mean coverage of 393. At this depth of coverage, 95% of the targeted bases were sufficiently covered to pass our threshold for variant calling (R103).…”
mentioning
confidence: 99%
“…For each of these variants, a score has been attributed with the five following methods: SIFT (released August 2011) 9 , Polyphen2 (HumDiv classifier model v2.1.0) 10 , Mutation Taster (released March 2010) 11 , LRT (released November 2009) 12 and PhyloP 13 thanks to the dbNSFP public database https://sites.google.com/site/jpopgen/dbNSFP. 4 . This database contains all possible SNPs within human genome coding regions, which have been determined by the CCDS project 14 , and for each of the 87 million SNPs, the scores of the five predictors have been pre-calculated and made available.…”
Section: Data Collection For Model Buildingmentioning
confidence: 99%
“…Many different methods have been developed and published over the past fifteen years, each of these has distinct advantages and disadvantages, but none can be considered as the gold standard 123 . The prediction scores of some of these methods have been compiled in the dbNSFP database for all known protein coding genome positions 4 . Besides, Li and colleagues proposed to combine five of them in a logistic regression framework 5 in order to globally improve predictive performance in comparison with individual scores.…”
Section: Introductionmentioning
confidence: 99%