Decomposing Retrosynthesis into Reactive Center Prediction and Molecule Generation

Liu, Xianggen; Li, Pengyong; Song, Sen

doi:10.1101/677849

“…Retrosynthesis has seen increased attention from the data science and cheminformatics communities recently with a number of machine learning efforts leveraging reaction templates or rules, [1][2][3][4] techniques adapted from natural language processing, [5][6][7][8] and graph based models. 9,10 However, only the template and rulebased methods are capable of making a connection from the prediction directly back to the source of the template or rule, which is most likely a reaction that was successfully performed in a laboratory. This ability to provide evidence and reasoning behind the prediction of a molecular transformation makes template-based methods an attractive choice for use in software tools designed for synthetic chemists, and guided the choice to pursue this method in this work.…”

Section: Introductionmentioning

confidence: 99%

Data Augmentation and Pretraining for Template-Based Retrosynthetic Prediction in Computer-Aided Synthesis Planning

Fortunato

¹

,

Coley

²

,

Barnes

³

et al. 2020

J. Chem. Inf. Model.

View full text Add to dashboard Cite

This work presents efforts to augment the performance of data-driven machine learning algorithms for reaction template recommendation used in computer-aided synthesis planning software. Often, machine learning models designed to perform the task of prioritizing reaction templates or molecular transformations are focused on reporting high accuracy metrics for the one-to-one mapping of product molecules in reaction databases to the template extracted from the recorded reaction. The available templates that get selected for inclusion in these machine learning models have been previously limited to those that appear frequently in the reaction databases and exclude potentially useful transformations. By augmenting open-access datasets of organic reactions with artificially calculated template applicability and pretraining a template relevance neural network on this augmented applicability dataset, we report an increase in the template applicability recall and an increase in the diversity of predicted precursors. The augmentation and pretraining effectively teaches the neural network an increased set of templates that could theoretically lead to successful reactions for a given target. Even on a small dataset of well curated reactions, the data augmentation and pretraining methods resulted in an increase in top-1 accuracy, especially for rare templates, indicating these strategies can be very useful for small datasets. File list (2)download file view on ChemRxiv Data_Augmentation_and_Pre_Training_for_Template_... (563.68 KiB) download file view on ChemRxiv SI_Data_Augmentation_and_Pre_Training_for_Templat...

show abstract

“…Here, we dispute the previous use of top-N accuracy [6,9,12,16,[32][33][34][35][36][37] and to introduce four different metrics, namely, round-trip accuracy, coverage, class diversity and Jensen-Shannon divergence [50], as seen in Figure 3, to evaluate single step retrosynthetic models and through them retrosynthetic tools as a whole. All these four metrics have been critically designed and assessed with the help of human domain experts (see Section 4.2 for a detailed description).…”

Section: Retromentioning

confidence: 91%

“…While this extensive production of AI models for Organic chemistry was made possible by the availability of public data [28,29], the noise contained in this data and generated by the text-mining extraction process is heavily reducing their potential. In fact, while rule-based systems [30] demonstrated, through wet-lab experiments, the capability to design target molecules with less purification steps and hence, leading to savings in time and cost [31], the AI approaches [6,9,12,16,[32][33][34][35][36][37][38] still have a long way to go.…”

Section: Introductionmentioning

confidence: 99%

Predicting Retrosynthetic Pathways Using a Combined Linguistic Model and Hyper-Graph Exploration Strategy

Schwaller

¹

,

Petraglia²,

Zullo³

et al. 2019

Preprint

View full text Add to dashboard Cite

<div><div><div><p>We present an extension of our Molecular Transformer architecture combined with a hyper-graph exploration strategy for automatic retrosyn- thesis route planning without human intervention. The single-step ret- rosynthetic model sets a new state of the art for predicting reactants as well as reagents, solvents and catalysts for each retrosynthetic step. We introduce new metrics (coverage, class diversity, round-trip accuracy and Jensen-Shannon divergence) to evaluate the single-step retrosynthetic models, using the forward prediction and a reaction classification model always based on the transformer architecture. The hypergraph is con- structed on the fly, and the nodes are filtered and further expanded based on a Bayesian-like probability. We critically assessed the end-to-end framework with several retrosynthesis examples from literature and aca- demic exams. Overall, the frameworks has a very good performance with few weaknesses due to the bias induced during the training process. The use of the newly introduced metrics opens up the possibility to optimize entire retrosynthetic frameworks through focusing on the performance of the single-step model only.</p><p><br></p><p>Available on IBM RXN for Chemistry: https://rxn.res.ibm.com.<br></p></div></div></div>

show abstract

“…Retrosynthesis has seen increased attention from the data science and cheminformatics communities recently with a number of machine learning efforts leveraging reaction templates or rules, [1][2][3][4] techniques adapted from natural language processing, [5][6][7][8] and graph based models. 9,10 However, only the template and rulebased methods are capable of making a connection from the prediction directly back to the source of the template or rule, which is most likely a reaction that was successfully performed in a laboratory. This ability to provide evidence and reasoning behind the prediction of a molecular transformation makes template-based methods an attractive choice for use in software tools designed for synthetic chemists, and guided the choice to pursue this method in this work.…”

Section: Introductionmentioning

confidence: 99%

Data Augmentation and Pretraining for Template-Based Retrosynthetic Prediction in Computer-Aided Synthesis Planning

Fortunato

¹

,

Coley

²

,

Barnes

³

et al. 2020

Preprint

View full text Add to dashboard Cite

This work presents efforts to augment the performance of data-driven machine learning algorithms for reaction template recommendation used in computer-aided synthesis planning software. Often, machine learning models designed to perform the task of prioritizing reaction templates or molecular transformations are focused on reporting high accuracy metrics for the one-to-one mapping of product molecules in reaction databases to the template extracted from the recorded reaction. The available templates that get selected for inclusion in these machine learning models have been previously limited to those that appear frequently in the reaction databases and exclude potentially useful transformations. By augmenting open-access datasets of organic reactions with artificially calculated template applicability and pretraining a template relevance neural network on this augmented applicability dataset, we report an increase in the template applicability recall and an increase in the diversity of predicted precursors. The augmentation and pretraining effectively teaches the neural network an increased set of templates that could theoretically lead to successful reactions for a given target. Even on a small dataset of well curated reactions, the data augmentation and pretraining methods resulted in an increase in top-1 accuracy, especially for rare templates, indicating these strategies can be very useful for small datasets.

show abstract

Decomposing Retrosynthesis into Reactive Center Prediction and Molecule Generation

Cited by 7 publications

References 19 publications

Data Augmentation and Pretraining for Template-Based Retrosynthetic Prediction in Computer-Aided Synthesis Planning

Data Augmentation and Pretraining for Template-Based Retrosynthetic Prediction in Computer-Aided Synthesis Planning

Predicting Retrosynthetic Pathways Using a Combined Linguistic Model and Hyper-Graph Exploration Strategy

Data Augmentation and Pretraining for Template-Based Retrosynthetic Prediction in Computer-Aided Synthesis Planning

Contact Info

Product

Resources

About