A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data

Costa, Pedro Rafael; Acencio, Márcio Luís; Lemke, Ney

doi:10.1186/1471-2164-11-s5-s9

Cited by 70 publications

(63 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similarly, gene–disease association data has been used before to discover new genes with important roles in disease, with precision estimates ranging from 0.61 to 0.84 [64, 71–73]. Interestingly, this is another scenario where sourcing unambiguous negative examples is challenging and has often been framed as a PU learning problem [71–73].…”

Section: Discussionmentioning

confidence: 99%

In silico prediction of novel therapeutic targets using gene–disease association data

2017

View full text Add to dashboard Cite

BackgroundTarget identification and validation is a pressing challenge in the pharmaceutical industry, with many of the programmes that fail for efficacy reasons showing poor association between the drug target and the disease. Computational prediction of successful targets could have a considerable impact on attrition rates in the drug discovery pipeline by significantly reducing the initial search space. Here, we explore whether gene–disease association data from the Open Targets platform is sufficient to predict therapeutic targets that are actively being pursued by pharmaceutical companies or are already on the market.MethodsTo test our hypothesis, we train four different classifiers (a random forest, a support vector machine, a neural network and a gradient boosting machine) on partially labelled data and evaluate their performance using nested cross-validation and testing on an independent set. We then select the best performing model and use it to make predictions on more than 15,000 genes. Finally, we validate our predictions by mining the scientific literature for proposed therapeutic targets.ResultsWe observe that the data types with the best predictive power are animal models showing a disease-relevant phenotype, differential expression in diseased tissue and genetic association with the disease under investigation. On a test set, the neural network classifier achieves over 71% accuracy with an AUC of 0.76 when predicting therapeutic targets in a semi-supervised learning setting. We use this model to gain insights into current and failed programmes and to predict 1431 novel targets, of which a highly significant proportion has been independently proposed in the literature.ConclusionsOur in silico approach shows that data linking genes and diseases is sufficient to predict novel therapeutic targets effectively and confirms that this type of evidence is essential for formulating or strengthening hypotheses in the target discovery process. Ultimately, more rapid and automated target prioritisation holds the potential to reduce both the costs and the development times associated with bringing new medicines to patients.Electronic supplementary materialThe online version of this article (doi:10.1186/s12967-017-1285-6) contains supplementary material, which is available to authorized users.

show abstract

Section: Discussionmentioning

confidence: 99%

In silico prediction of novel therapeutic targets using gene–disease association data

2017

View full text Add to dashboard Cite

show abstract

“…PROSPECTR 27 uses 23 sequence-based features and predicts disease genes from OMIM with precision = 0.62 and recall = 0.70 with an AUC of 0.70. The most directly comparable method, presented in Costa et al , 18 utilizes topological features of gene interaction networks to predict both morbidity genes (P=0.66, R=0.65, AUC=0.72) and druggable genes (P=0.75, R=0.78, AUC=0.82). While the majority of other methods utilize sequence-based features, protein interactions, and other genomic networks, our method requires only Gene Ontology annotations and simple bigrams/collocations extracted from biomedical literature.…”

Section: Resultsmentioning

confidence: 99%

“…We also looked up our predicted genes in the results from a previous study on predicting morbid and druggable genes, and 90% (9 out of 10) of our predicted pharmacogenes were also predicted to be morbid (variations cause hereditary human diseases) or druggable. 18 …”

Section: Resultsmentioning

confidence: 99%

“…11 Other computational methods have been developed to identify genetic causes underlying disorders through gene prioritization, but many of these are designed to work on small sets of disease-specific genes. 12–17 The method which is closest to the one that we present here is described in Costa et al ; 18 they create separate classifiers to predict morbidity-associated and druggable genes on a genome-wide scale. A majority of these methods use sequence-based features, network topology, and other features from curated databases; only a few use information from literature.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Combining Heterogenous Data for Prediction of Disease Related and Pharmacogenes

2013

View full text Add to dashboard Cite

Identifying genetic variants that affect drug response or play a role in disease is an important task for clinicians and researchers. Before individual variants can be explored efficiently for effect on drug response or disease relationships, specific candidate genes must be identified. While many methods rank candidate genes through the use of sequence features and network topology, only a few exploit the information contained in the biomedical literature. In this work, we train and test a classifier on known pharmacogenes from PharmGKB and present a classifier that predicts pharmacogenes on a genome-wide scale using only Gene Ontology annotations and simple features mined from the biomedical literature. Performance of F=0.86, AUC=0.860 is achieved. The top 10 predicted genes are analyzed. Additionally, a set of enriched pharmacogenic Gene Ontology concepts is produced.

show abstract

“…The PPIs of TFs and TGs, on the other hand, were extracted from a integrated network of human gene interactions recently published by our group [11]. …”

Section: Construction and Contentmentioning

confidence: 99%

HTRIdb: an open-access database for experimentally verified human transcriptional regulation interactions

2012

Self Cite

View full text Add to dashboard Cite

BackgroundThe modeling of interactions among transcription factors (TFs) and their respective target genes (TGs) into transcriptional regulatory networks is important for the complete understanding of regulation of biological processes. In the case of experimentally verified human TF-TG interactions, there is no database at present that explicitly provides such information even though many databases containing human TF-TG interaction data have been available. In an effort to provide researchers with a repository of experimentally verified human TF-TG interactions from which such interactions can be directly extracted, we present here the Human Transcriptional Regulation Interactions database (HTRIdb).DescriptionThe HTRIdb is an open-access database that can be searched via a user-friendly web interface and the retrieved TF-TG interactions data and the associated protein-protein interactions can be downloaded or interactively visualized as a network through the web version of the popular Cytoscape visualization tool, the Cytoscape Web. Moreover, users can improve the database quality by uploading their own interactions and indicating inconsistencies in the data. So far, HTRIdb has been populated with 284 TFs that regulate 18302 genes, totaling 51871 TF-TG interactions. HTRIdb is freely available at http://www.lbbc.ibb.unesp.br/htri.ConclusionsHTRIdb is a powerful user-friendly tool from which human experimentally validated TF-TG interactions can be easily extracted and used to construct transcriptional regulation interaction networks enabling researchers to decipher the regulation of biological processes.

show abstract

A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data

Cited by 70 publications

References 53 publications

In silico prediction of novel therapeutic targets using gene–disease association data

In silico prediction of novel therapeutic targets using gene–disease association data

Combining Heterogenous Data for Prediction of Disease Related and Pharmacogenes

HTRIdb: an open-access database for experimentally verified human transcriptional regulation interactions

Contact Info

Product

Resources

About