Machine Learning Models to Predict Inhibition of the Bile Salt Export Pump

McLoughlin, Kevin; Jeong, Claire G.; Sweitzer, Thomas D.; Minnich, Amanda; Tse, Margaret J.; Bennion, Brian J.; Allen, Jonathan; Calad-Thomson, Stacie; Rush, Thomas S.; Brase, James M.

doi:10.1021/acs.jcim.0c00950

Cited by 18 publications

(29 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Given a reliable and fast approximate mapping from molecule to its property value, the weighted retraining approach can optimize the latent space jointly for more practical properties that are responsible for higher attrition rate of proposed drugs. With the availability of the surrogate models for protein-ligand binding score [24], inhibition of bile salt export pump [25], our approach can optimize the latent space in producing candidate drugs that are most likely to be active against specific target without causing possible damage to the patients. In terms of computational cost, retraining the generative network multiple times may be slightly expensive for larger network compared to the one we have used.…”

Section: Discussionmentioning

confidence: 99%

Multi-Objective Latent Space Optimization of Generative Molecular Design Models

Abeer¹,

Urban²,

Weil³

et al. 2022

Preprint

View full text Add to dashboard Cite

Molecular design based on generative models, such as variational autoencoders (VAEs), has become increasingly popular in recent years due to its efficiency for exploring high-dimensional molecular space to identify molecules with desired properties. While the efficacy of the initial model strongly depends on the training data, the sampling efficiency of the model for suggesting novel molecules with enhanced properties can be further enhanced via latent space optimization. In this paper, we propose a multi-objective latent space optimization (LSO) method that can significantly enhance the performance of generative molecular design (GMD). The proposed method adopts an iterative weighted retraining approach, where the respective weights of the molecules in the training data are determined by their Pareto efficiency. We demonstrate that our multi-objective GMD LSO method can significantly improve the performance of GMD for jointly optimizing multiple molecular properties.

show abstract

Section: Discussionmentioning

confidence: 99%

Multi-Objective Latent Space Optimization of Generative Molecular Design Models

Abeer¹,

Urban²,

Weil³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Using AMPL it was possible to automate the training and comparisons of models using different hyperparameter settings including molecular descriptors used and machine learning methods e.g., random forests, neural networks, and XGBoost. Previous work has shown that this procedure leads to high quality models [31].…”

Section: Methodsmentioning

confidence: 99%

Model Choice Metrics to Optimize Profile-QSAR Performance

Kim

McLoughlin

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Predicting molecular activity against protein targets is difficult because of the paucity of experimental data. Approaches like multitask modeling and collaborative filtering seek to improve model accuracy by leveraging results from multiple targets, but are limited because different compounds are measured with different assays, leading to sparse data matrices. Profile-QSAR (pQSAR) 2.0 addresses this problem by fitting a series of partial least squares models for each target, using as features the predictions from single-task models on the remaining targets. This method has been shown to produce better results than single task and multitask models. However, the factors determining the success of pQSAR 2.0 have as yet not been characterized. In this paper we examine the experimental conditions that lead to better pQSAR models. We limit the amount of data available to the method by retraining with decreasing amounts of data and explore the model's ability to generalize to compounds that have never been assayed. Finally, we look at the properties of training data needed to demonstrate pQSAR improvement. We apply pQSAR 2.0 on a collection of GPCR and safety targets collected from Drug Target Commons, ExcapeDB, and ChEMBL. We found that pQSAR improved models on 34 of the 149 assays selected. In the other 115 assays, single task random forests offered better performance. There are many factors that contribute to an increase in performance, but the main factor is compound assay coverage. The pQSAR model improves when more compounds are measured in multiple assays. It is necessary to consider the available data before applying pQSAR. Successful pQSAR models require a profile made of correlated targets that share compounds with other assays. This technique is best used when experimental data is available as random forest regressors often do not generalize well enough for virtual drug search applications.

show abstract

“…Machine learning (ML) models have become a key tool to predict compound properties from molecular structure, − also known as quantitative structure–property relationship (QSPR) models. In drug discovery projects, ML-based predictions are used to select the most promising series, compounds, or chemical modifications.…”

Section: Introductionmentioning

confidence: 99%

Systematic Evaluation of Local and Global Machine Learning Models for the Prediction of ADME Properties

2023

View full text Add to dashboard Cite

Machine learning (ML) has become an indispensable tool to predict absorption, distribution, metabolism, and excretion (ADME) properties in pharmaceutical research. ML algorithms are trained on molecular structures and corresponding ADME assay data to develop quantitative structure−property relationship (QSPR) models. Traditional QSPR models were trained on compound sets of limited size. With the advent of more complex ML algorithms and data availability, training sets have become larger and more diverse. Most common training approaches consist in either training a model with a small set of similar compounds, namely, compounds designed for the same drug discovery project or chemical series (local model approach) or with a larger set of diverse compounds (global model approach). Global models are built with all experimental data available for an assay, combining compound data from different projects and disease areas. Despite the ML progress made so far, the choice of the appropriate data composition for building ML models is still unclear. Herein, a systematic evaluation of local and global ML models was performed for 10 different experimental assays and 112 drug discovery projects. Results show a consistent superior performance of global models for ADME property predictions. Diagnostic analyses were also carried out to investigate the influence of training set size, structural diversity, and data shift in the relative performance of local and global ML models. Training set and structural diversity did not have an impact in the relative performance on the methods. Instead, data shift helped to identify the projects with larger performance differences between local and global models. Results presented in this work can be leveraged to improve MLbased ADME properties predictions and thus decision-making in drug discovery projects.

show abstract

Machine Learning Models to Predict Inhibition of the Bile Salt Export Pump

Cited by 18 publications

References 27 publications

Multi-Objective Latent Space Optimization of Generative Molecular Design Models

Multi-Objective Latent Space Optimization of Generative Molecular Design Models

Model Choice Metrics to Optimize Profile-QSAR Performance

Systematic Evaluation of Local and Global Machine Learning Models for the Prediction of ADME Properties

Contact Info

Product

Resources

About