Reliable Prediction Errors for Deep Neural Networks Using Test-Time Dropout

Cortés-Ciriano, Isidro; Bender, Andreas

doi:10.1021/acs.jcim.9b00297

Cited by 51 publications

(73 citation statements)

References 54 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The scaled pIC50 MSE values in Table 1 translate to a root mean squared error of approximately 0.8 pIC50 units. These values are in the range of expected errors for ML models trained on heterogeneous ChEMBL25 data (Kalliokoski et al, 2013 ), and are in agreement with prior literature that also demonstrated that RF and FFN models approached the upper limit of overall accuracy across the dataset, given the heterogeneous IC50 measurements in ChEMBL25 (Cortés-Ciriano and Bender, 2019a ).…”

Section: Resultssupporting

confidence: 90%

“…Interestingly, there is very little variation in the size of the FFN confidence intervals across all predictions on the validation set ( Figure 4A ) but this is still sufficient for the FFN to generate valid prediction intervals ( Figure 4B ). In total, conformal prediction is able to accurately gauge both RF and FFN model confidence for predictions on held-out validation data, in agreement with prior literature (Svensson et al, 2018 ; Cortés-Ciriano and Bender, 2019a ).…”

Section: Resultssupporting

confidence: 86%

“…Conformal prediction requires a way to gauge the similarity of a new piece of data to training data. Recent literature has shown that the standard deviation across the trees of a RF model (Svensson et al, 2018 ), and the use of test-time dropout in the case of FFN models (Cortés-Ciriano and Bender, 2019a ), provide valid and efficient conformal predictors. However, these methods have not been analyzed in prospective settings extensively.…”

Section: Methodsmentioning

confidence: 99%

See 2 more Smart Citations

An Analysis of Proteochemometric and Conformal Prediction Machine Learning Protein-Ligand Binding Affinity Models

parks

Gaieb

Amaro

2020

Front. Mol. Biosci.

View full text Add to dashboard Cite

Protein-ligand binding affinity is a key pharmacodynamic endpoint in drug discovery. Sole reliance on experimental design, make, and test cycles is costly and time consuming, providing an opportunity for computational methods to assist. Herein, we present results comparing random forest and feed-forward neural network proteochemometric models for their ability to predict pIC50 measurements for held out generic Bemis-Murcko scaffolds. In addition, we assess the ability of conformal prediction to provide calibrated prediction intervals in both a retrospective and semi-prospective test using the recently released Grand Challenge 4 data set as an external test set. In total, random forest and deep neural network proteochemometric models show quality retrospective performance but suffer in the semi-prospective setting. However, the conformal predictor prediction intervals prove to be well-calibrated both retrospectively and semi-prospectively showing that they can be used to guide hit discovery and lead optimization campaigns.

show abstract

Section: Resultssupporting

confidence: 90%

Section: Resultssupporting

confidence: 86%

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

An Analysis of Proteochemometric and Conformal Prediction Machine Learning Protein-Ligand Binding Affinity Models

parks

Gaieb

Amaro

2020

Front. Mol. Biosci.

View full text Add to dashboard Cite

show abstract

“…Further information about the data sets is given in Table 1 and in a previous study by the authors [42]. We also collected 25 QSAR data sets for validation from previous work by the authors (Table 2) [42][43][44]. All data sets used in this study, as well as the code required to generate the results presented herein, are publicly available at https ://githu b.com/isidr oc/QAFFP _regre ssion .…”

Section: Data Collection and Curationmentioning

confidence: 99%

QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction

et al. 2020

Self Cite

View full text Add to dashboard Cite

Affinity fingerprints report the activity of small molecules across a set of assays, and thus permit to gather information about the bioactivities of structurally dissimilar compounds, where models based on chemical structure alone are often limited, and model complex biological endpoints, such as human toxicity and in vitro cancer cell line sensitivity. Here, we propose to model in vitro compound activity using computationally predicted bioactivity profiles as compound descriptors. To this aim, we apply and validate a framework for the calculation of QSAR-derived affinity fingerprints (QAFFP) using a set of 1360 QSAR models generated using K i , K d , IC 50 and EC 50 data from ChEMBL database. QAFFP thus represent a method to encode and relate compounds on the basis of their similarity in bioactivity space. To benchmark the predictive power of QAFFP we assembled IC 50 data from ChEMBL database for 18 diverse cancer cell lines widely used in preclinical drug discovery, and 25 diverse protein target data sets. This study complements part 1 where the performance of QAFFP in similarity searching, scaffold hopping, and bioactivity classification is evaluated. Despite being inherently noisy, we show that using QAFFP as descriptors leads to errors in prediction on the test set in the ~ 0.65-0.95 pIC 50 units range, which are comparable to the estimated uncertainty of bioactivity data in ChEMBL (0.76-1.00 pIC 50 units). We find that the predictive power of QAFFP is slightly worse than that of Morgan2 fingerprints and 1D and 2D physicochemical descriptors, with an effect size in the 0.02-0.08 pIC 50 units range. Including QSAR models with low predictive power in the generation of QAFFP does not lead to improved predictive power. Given that the QSAR models we used to compute the QAFFP were selected on the basis of data availability alone, we anticipate better modeling results for QAFFP generated using more diverse and biologically meaningful targets. Data sets and Python code are publicly available at https ://githu b.com/isidr oc/QAFFP _regre ssion .

show abstract

“…Having trained and validated the signaturizers, we massively inferred missing signatures for the ~800,000 molecules available in the CC, obtaining a complete set of 25x128-dimensional signatures for each molecule (chemicalchecker.org/downloads). To explore the reliability of the inferred signatures, we assigned an 'applicability' score (α) to predictions based on the following: (a) the proximity of a predicted signature to true (experimental) signatures available in the training set; (b) the robustness of the SNN output to a test-time data dropout 10 ; and (c) the accuracy expected a priori based on the experimental CC datasets available for the molecule (Figure 2a).…”

Section: Large-scale Inference Of Bioactivity Signaturesmentioning

confidence: 99%

Bioactivity descriptors for uncharacterized compounds

Bertoni

Duran‐Frigola

Badia-i-Mompel

et al. 2020

Preprint

View full text Add to dashboard Cite

Chemical descriptors encode the physicochemical and structural properties of small molecules, and they are at the core of chemoinformatics. The broad release of bioactivity data has prompted enriched representations of compounds, reaching beyond chemical structures and capturing their known biological properties. Unfortunately, ‘bioactivity descriptors’ are not available for most small molecules, which limits their applicability to a few thousand well characterized compounds. Here we present a collection of deep neural networks able to infer bioactivity signatures for any compound of interest, even when little or no experimental information is available for them. Our ‘signaturizers’ relate to bioactivities of 25 different types (including target profiles, cellular response and clinical outcomes) and can be used as drop-in replacements for chemical descriptors in day-to-day chemoinformatics tasks. Indeed, we illustrate how inferred bioactivity signatures are useful to navigate the chemical space in a biologically relevant manner, and unveil higher-order organization in drugs and natural product collections. Moreover, we implement a battery of signature-activity relationship (SigAR) models and show a substantial improvement in performance, with respect to chemistry-based classifiers, across a series of biophysics and physiology activity prediction benchmarks.

show abstract

Reliable Prediction Errors for Deep Neural Networks Using Test-Time Dropout

Cited by 51 publications

References 54 publications

An Analysis of Proteochemometric and Conformal Prediction Machine Learning Protein-Ligand Binding Affinity Models

An Analysis of Proteochemometric and Conformal Prediction Machine Learning Protein-Ligand Binding Affinity Models

QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction

Bioactivity descriptors for uncharacterized compounds

Contact Info

Product

Resources

About