Batched Bayesian Optimization for Drug Design in Noisy Environments

Bellamy, Hugo; Rehim, Abbi Abdel; Orhobor, Oghenejokpeme I.; King, Ross D.

doi:10.1021/acs.jcim.2c00602

Cited by 22 publications

(21 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As the batch number increases, the regrets exhibit a declining trend, indicating that the predictions obtained using Bayesian optimization are getting closer to the optimal points. [41][42][43] We analyze the average variance for different mixtures in the Bayesian optimization method to have an estimation of the overall variation of the Bayesian optimization. The result shows that the average standard deviation is around 100 ppm, which is a relatively high variance (Fig.…”

Section: Resultsmentioning

confidence: 99%

“…However, with the help of Bayesian optimization we can handle the high variance and still find improved mixtures in a shorter time. [41][42][43] Fig. S7 (ESI †) gives us all the combinations and their measured values for each batch.…”

Section: Resultsmentioning

confidence: 99%

“…The details of the Bayesian optimization process used here are described elsewhere. [41][42][43] Briefly, a Python script was used to deal with the combinatorial optimization problem by searching the best mixture with the lowest outcome. To perform Bayesian optimization, we exploited the Gaussian process as a surrogate model to show the current uncertainties of all candidate combinations updated by n observed combinations in the first step.…”

Section: Bayesian Optimizationmentioning

confidence: 99%

See 2 more Smart Citations

Combinatorial mixtures of organic solutes for improved liquid/liquid extraction of ions

Liu,

Wei,

Wang

et al. 2023

Soft Matter

View full text Add to dashboard Cite

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: Bayesian Optimizationmentioning

confidence: 99%

See 1 more Smart Citation

Combinatorial mixtures of organic solutes for improved liquid/liquid extraction of ions

Liu,

Wei,

Wang

et al. 2023

Soft Matter

View full text Add to dashboard Cite

show abstract

“…111 Bellamy et al used batch BO to explore a large database for use in drug design. 112 Specifically for the design of polymers, Li et al constructed ML surrogates for experiments and applied BO to propose short fiber polymer designs. 113 Gao et al also used an ML-based surrogate for the objective evaluation of BO for the design of polymeric membranes.…”

Section: Fundamentalsmentioning

confidence: 99%

Computational and data-driven modelling of solid polymer electrolytes

Wang,

Shi,

et al. 2023

Digital Discovery

View full text Add to dashboard Cite

show abstract

“…Graff et al 29 developed a framework (MolPAL) based on batched Bayesian optimization, 30,31 which formulates a strong synergy with pool-based active learning, 28,32 to successfully recover 94.8% of the top-50,000 compounds from a 99.5 million-sized library by docking only 2.4% of it. Such a framework is shown to be effective in noisy environment 33 (a common problem with docking data), and a pruning algorithm is developed to improve efficiency by reducing the screening space. 34 The choice of the surrogate machine learning model in the active learning framework is one of the dominant factors that affects the hit recovery rate.…”

Section: ■ Introductionmentioning

confidence: 99%

Large-Scale Pretraining Improves Sample Efficiency of Active Learning-Based Virtual Screening

Cao,

Sciabola,

Wang

2024

J. Chem. Inf. Model.

View full text Add to dashboard Cite

Virtual screening of large compound libraries to identify potential hit candidates is one of the earliest steps in drug discovery. As the size of commercially available compound collections grows exponentially to the scale of billions, active learning and Bayesian optimization have recently been proven as effective methods of narrowing down the search space. An essential component of those methods is a surrogate machine learning model that predicts the desired properties of compounds. An accurate model can achieve high sample efficiency by finding hits with only a fraction of the entire library being virtually screened. In this study, we examined the performance of a pretrained transformer-based language model and graph neural network in a Bayesian optimization active learning framework. The best pretrained model identifies 58.97% of the top-50,000 compounds after screening only 0.6% of an ultralarge library containing 99.5 million compounds, improving 8% over the previous state-of-the-art baseline. Through extensive benchmarks, we show that the superior performance of pretrained models persists in both structure-based and ligand-based drug discovery. Pretrained models can serve as a boost to the accuracy and sample efficiency of active learning-based virtual screening.

show abstract

Batched Bayesian Optimization for Drug Design in Noisy Environments

Cited by 22 publications

References 26 publications

Combinatorial mixtures of organic solutes for improved liquid/liquid extraction of ions

Combinatorial mixtures of organic solutes for improved liquid/liquid extraction of ions

Computational and data-driven modelling of solid polymer electrolytes

Large-Scale Pretraining Improves Sample Efficiency of Active Learning-Based Virtual Screening

Contact Info

Product

Resources

About