Optimizing molecules using efficient queries from property evaluations

Hoffman, Samuel C.; Chenthamarakshan, Vijil; Wadhawan, Kahini; Chen, Pin-Yu; Das, Payel

doi:10.1038/s42256-021-00422-y

Cited by 42 publications

(46 citation statements)

References 59 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…solubility, number of hydrogen bonding donor/acceptor sites, structural diversity) are potential directions for further work. Iterative optimization methods 47 can be adopted to improve initial hits by querying a set of molecular property evaluators along with a retrosynthesis predictor. Active learning paradigms can be also explored for improving process efficiency.…”

Section: Discussionmentioning

confidence: 99%

Accelerating Inhibitor Discovery for Multiple SARS-CoV-2 Targets with a Single, Sequence-Guided Deep Generative Framework

Chenthamarakshan¹,

Hoffman²,

Owen

et al. 2022

Preprint

View full text Add to dashboard Cite

The COVID-19 pandemic has highlighted the urgency for developing more efficient molecular discovery pathways. As exhaustive exploration of the vast chemical space is infeasible, discovering novel inhibitor molecules for emerging drug-target proteins is challenging, particularly for targets with unknown structure or ligands. We demonstrate the broad utility of a single deep generative framework toward discovering novel drug-like inhibitor molecules against two distinct SARS-CoV-2 targets — the main protease (Mpro) and the receptor binding domain (RBD) of the spike protein. To perform target-aware design, the framework employs a target sequence-conditioned sampling of novel molecules from a generative model. Micromolar-level in vitro inhibition was observed for two candidates (out of four synthesized) for each target. The most potent spike RBD inhibitor also emerged as a rare non-covalent antiviral with broad-spectrum activity against several SARS-CoV-2 variants in live virus neutralization assays. These results show that a broadly deployable machine intelligence framework can accelerate hit discovery across different emerging drug-targets.

show abstract

Section: Discussionmentioning

confidence: 99%

Accelerating Inhibitor Discovery for Multiple SARS-CoV-2 Targets with a Single, Sequence-Guided Deep Generative Framework

Chenthamarakshan¹,

Hoffman²,

Owen

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…solubility, number of hydrogen bonding donor/acceptor sites, structural diversity) are potential directions for further work. Iterative optimization methods 45 can be adopted to improve initial hits by querying a set of molecular property evaluators along with a retrosynthesis predictor. Active learning paradigms can be also explored for improving process efficiency.…”

Section: Discussionmentioning

confidence: 99%

Accelerating Inhibitor Discovery With A Deep Generative Foundation Model: Validation for SARS-CoV-2 Drug Targets

Chenthamarakshan¹,

Hoffman²,

Owen³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

The COVID-19 pandemic has highlighted the urgency for developing more efficient molecular discovery pathways. As exhaustive exploration of the vast chemical space is infeasible, discovering novel inhibitor molecules for emerging drug-target proteins is challenging, particularly for targets with unknown structure or ligands. We demonstrate the broad utility of a single deep generative framework toward discovering novel drug-like inhibitor molecules against two distinct SARS-CoV-2 targets -the main protease (M pro ) and the receptor binding domain (RBD) of the spike protein. To perform target-aware design, the framework employs a target sequence-conditioned sampling of novel molecules from a generative model. Micromolar-level in vitro inhibition was observed for two candidates (out of four synthesized) for each target. The most potent spike RBD inhibitor also emerged as a rare non-covalent antiviral with broad-spectrum activity against several SARS-CoV-2 variants in live virus neutralization assays. These results show that a broadly deployable machine intelligence framework can accelerate hit discovery across different emerging drug-targets.

show abstract

“…Biological sequence design has been approached with a wide variety of methods: reinforcement learning (Angermueller et al, 2019), Bayesian optimization (Wilson et al, 2017;Belanger et al, 2019;Moss et al, 2020;Pyzer-Knapp, 2018;Terayama et al, 2021), search/sampling using deep generative models (Brookes et al, 2019a;Kumar & Levine, 2020;Das et al, 2021;Hoffman et al, 2021;Melnyk et al, 2021), deep model-based optimization (Trabucco et al, 2021a), adaptive evolutionary methods (Hansen, 2006;Swersky et al, 2020;Sinai et al, 2020), likelihood-free inference (Zhang et al, 2021), and black-box optimization with surrogate models (Dadkhahi et al, 2021). As suggested in Section 3, GFlowNets have the potential to improve over such methods by amortizing the cost of search (e.g., when comparing with MCMC's mixing time) over learning, giving probability mass to the entire space facilitating exploration and diversity (vs e.g., RL which tends to be greedier), enabling the use of imperfect data (vs e.g., generative models that require strictly positive or negative samples), and by scaling well with data by exploiting structure in function approximation (vs e.g., Bayesian methods that can cost O(n 3 ) for n datapoints).…”

Section: Related Workmentioning

confidence: 99%

Biological Sequence Design with GFlowNets

Moksh¹,

Bengio²,

Garcia³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Design of de novo biological sequences with desired properties, like protein and DNA sequences, often involves an active loop with several rounds of molecule ideation and expensive wet-lab evaluations. These experiments can consist of multiple stages, with increasing levels of precision and cost of evaluation, where candidates are filtered. This makes the diversity of proposed candidates a key consideration in the ideation phase. In this work, we propose an active learning algorithm leveraging epistemic uncertainty estimation and the recently proposed GFlowNets as a generator of diverse candidate solutions, with the objective to obtain a diverse batch of useful (as defined by some utility function, for example, the predicted anti-microbial activity of a peptide) and informative candidates after each round. We also propose a scheme to incorporate existing labeled datasets of candidates, in addition to a reward function, to speed up learning in GFlowNets. We present empirical results on several biological sequence design tasks, and we find that our method generates more diverse and novel batches with high scoring candidates compared to existing approaches.

show abstract

Optimizing molecules using efficient queries from property evaluations

Cited by 42 publications

References 59 publications

Accelerating Inhibitor Discovery for Multiple SARS-CoV-2 Targets with a Single, Sequence-Guided Deep Generative Framework

Accelerating Inhibitor Discovery for Multiple SARS-CoV-2 Targets with a Single, Sequence-Guided Deep Generative Framework

Accelerating Inhibitor Discovery With A Deep Generative Foundation Model: Validation for SARS-CoV-2 Drug Targets

Biological Sequence Design with GFlowNets

Contact Info

Product

Resources

About