2015
DOI: 10.1093/bioinformatics/btu862
|View full text |Cite
|
Sign up to set email alerts
|

DDIG-in: detecting disease-causing genetic variations due to frameshifting indels and nonsense mutations employing sequence and structural properties at nucleotide and protein levels

Abstract: We have built a machine learning method called DDIG-in (FS) based on real human genetic variations from the Human Gene Mutation Database (inherited disease-causing) and the 1000 Genomes Project (GP) (putatively neutral). The method incorporates both sequence and predicted structural features and yields a robust performance by 10-fold cross-validation and independent tests on both FS indels and NS variants. We showed that human-derived NS variants and FS indels derived from animal orthologs can be effectively e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
50
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
7
1
1

Relationship

4
5

Authors

Journals

citations
Cited by 55 publications
(50 citation statements)
references
References 39 publications
0
50
0
Order By: Relevance
“…HGMD has also been used by a number of different groups to aid the development of a wide variety of post-NGS variant interpretation and exome prioritisation algorithms including MutPred (Li et al 2009), MutPred Splice (Mort et al 2014), PROVEAN (Choi et al 2012), CAROL (Lopes et al 2012), regSNPs (Teng et al 2012), CRAVAT (Douville et al 2013), NEST (Carter et al 2013), FATHMM (Shihab et al 2013), FATHMM-MKL (Shihab et al 2015), PinPor (Zhang et al 2014), MutationTaster2 (Schwarz et al 2014), Phen-Gen (Javed et al 2014), VEST-indel (Douville et al 2016), Gene Damage Index (Itan et al 2015), DDIG-in (Folkman et al 2015), RSVP (Peterson et al 2016), ExonImpact (Li et al 2017), IntSplice (Shibata et al 2016), snvForest (Wu et al 2015), IMHOTEP (Knecht et al 2017) and M-CAP (Jagadeesh et al 2016). A list of some of the articles which have utilised HGMD data or expertise in their analyses can be found on the HGMD website (http://www.hgmd.cf.ac.uk/docs/articles.html).…”
Section: How Hgmd Is Utilisedmentioning
confidence: 99%
“…HGMD has also been used by a number of different groups to aid the development of a wide variety of post-NGS variant interpretation and exome prioritisation algorithms including MutPred (Li et al 2009), MutPred Splice (Mort et al 2014), PROVEAN (Choi et al 2012), CAROL (Lopes et al 2012), regSNPs (Teng et al 2012), CRAVAT (Douville et al 2013), NEST (Carter et al 2013), FATHMM (Shihab et al 2013), FATHMM-MKL (Shihab et al 2015), PinPor (Zhang et al 2014), MutationTaster2 (Schwarz et al 2014), Phen-Gen (Javed et al 2014), VEST-indel (Douville et al 2016), Gene Damage Index (Itan et al 2015), DDIG-in (Folkman et al 2015), RSVP (Peterson et al 2016), ExonImpact (Li et al 2017), IntSplice (Shibata et al 2016), snvForest (Wu et al 2015), IMHOTEP (Knecht et al 2017) and M-CAP (Jagadeesh et al 2016). A list of some of the articles which have utilised HGMD data or expertise in their analyses can be found on the HGMD website (http://www.hgmd.cf.ac.uk/docs/articles.html).…”
Section: How Hgmd Is Utilisedmentioning
confidence: 99%
“…1), our method takes a list of candidate indels and an OMIM 33 identifier for disease of interest as input and produces a ranking list of the candidates as output. To achieve this goal, we first extract for each indel five functional prediction scores, including SIFT 18 , PinPor 22 , CADD 21 , DDIG 19 and VEST 20 , from their corresponding websites. Because these scores are different from each other in such factors as training data, prediction method, numeric scales and so on, we transform these scores into p -values (detailed in “Methods”), which provides a unified representation of functionally damaging effects of candidate indels.…”
Section: Resultsmentioning
confidence: 99%
“…Specifically, our method integrates five indel functional prediction scores, including CADD 23 , VEST 20 , SIFT 18 , DDIG 19, 24 and PinPor 22 , four genic association scores derived from four different genomic data, including gene expression 25 , protein-protein interaction 26 , gene ontology 27 and transcriptional regulation 28 , and a genic intolerance score named RVIS 29 . We transform each functional prediction score and RVIS score into a p -value by comparing it against the corresponding empirical null distribution.…”
Section: Introductionmentioning
confidence: 99%
“…The mutations were categorized as neutral or dangerous, with the related confidence level ranging from 0 (low) to 1 (high). In order to discriminate diseasecausing from neutral frameshifting insertions or deletions (indels) and nonsense variants, that disrupt the protein coding sequence downstream of the mutation, we used DDIG-in (http://sparks-lab.org/ddig) [18]. It is a machinelearning tool that can predict the disease probability, since it is trained on inherited disease-causing mutations from the Human Gene Mutation Database (HGMD) and putatively neutral variants from the 1000 Genomes Project.…”
Section: Predicition Of Protein Alterationsmentioning
confidence: 99%