2016
DOI: 10.1002/minf.201501012
|View full text |Cite
|
Sign up to set email alerts
|

Designing Multi‐target Compound Libraries with Gaussian Process Models

Abstract: We present the application of machine learning models to selecting G protein‐coupled receptor (GPCR)‐focused compound libraries. The library design process was realized by ant colony optimization. A proprietary Boehringer‐Ingelheim reference set consisting of 3519 compounds tested in dose‐response assays at 11 GPCR targets served as training data for machine learning and activity prediction. We compared the usability of the proprietary data with a public data set from ChEMBL. Gaussian process models were train… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 18 publications
0
4
0
Order By: Relevance
“…This was likely due to the fact that the ChEMBL set was larger and more structurally diverse. 35 173 human targets with K i bioactivity data were extracted from ChEMBL and used to build Nai ̈ve Bayes, logistic regression, and random forest (RF) models using Morgan fingerprints or FP2 that were in turn used for target inference. 36 Seven-fold cross-validation and temporal cross-validation demonstrated that cutoffs that were more potent had better accuracy and Matthew's correlation coefficient (MCC).…”
Section: ■ Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…This was likely due to the fact that the ChEMBL set was larger and more structurally diverse. 35 173 human targets with K i bioactivity data were extracted from ChEMBL and used to build Nai ̈ve Bayes, logistic regression, and random forest (RF) models using Morgan fingerprints or FP2 that were in turn used for target inference. 36 Seven-fold cross-validation and temporal cross-validation demonstrated that cutoffs that were more potent had better accuracy and Matthew's correlation coefficient (MCC).…”
Section: ■ Introductionmentioning
confidence: 99%
“…A set of G-protein coupled receptor targets for 5-HT2c, melanin-concentrating hormone, and adenosine A1 were used to build Gaussian process models with CATS2 descriptors which were more predictive for a test set than models developed from a proprietary Boehringer Ingelheim dataset. This was likely due to the fact that the ChEMBL set was larger and more structurally diverse . 173 human targets with K i bioactivity data were extracted from ChEMBL and used to build Naïve Bayes, logistic regression, and random forest (RF) models using Morgan fingerprints or FP2 that were in turn used for target inference .…”
Section: Introductionmentioning
confidence: 99%
“…High-dimensional numerical target labels can be simultaneously predicted , using a variety of methods popular in the physical sciences, including linear regression (LR), k -nearest neighbors (kNN), , random forests (RF), support vector machine (SVMs), , artificial neural networks (ANNs), , and Gaussian process regressors (GPRs). Each method has strengths and weaknesses, including dealing with high dimensional or imbalanced data, whether the algorithm is intrinsically multitarget (ingesting multiple labels without variations to the pipeline), and the degree of model transparency, interpretability, or explainability.…”
Section: Methodsmentioning
confidence: 99%
“…A set of G-protein coupled receptor targets for 5-HT2c, melanin concentrating hormone and adenosine A1 were used to build Gaussian process models with CATS2 descriptors which were more predictive for a test set than models developed from a proprietary Boehringer Ingelheim dataset. This was likely due to the fact that the ChEMBL set was larger and more structurally diverse 33 . 173 human targets with Ki bioactivity data were extracted from ChEMBL and used to build Naïve Bayes, logistic regression and random forest models using Morgan fingerprints or FP2 that were in turn used for target inference 34 .…”
Section: Introductionmentioning
confidence: 99%