Data Augmentation and Pretraining for Template-Based Retrosynthetic Prediction in Computer-Aided Synthesis Planning

Fortunato, Michael W-P; Coley, Connor W.; Barnes, Brian C.; Jensen, Klavs F.

doi:10.26434/chemrxiv.11811564.v1

Cited by 3 publications

(4 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, we propose that this methodology can be extended to other specialized domains within synthesis planning tasks where the data may be limited and domain specific knowledge (i.e., a specialist) is required. The methodology could also be extended to combine various data sources to increase domain specific coverage, in addition to data augmentation techniques published at the time of writing this manuscript …”

Section: Discussionmentioning

confidence: 99%

“Ring Breaker”: Neural Network Driven Synthesis Prediction of the Ring System Chemical Space

et al. 2020

View full text Add to dashboard Cite

Ring systems in pharmaceuticals, agrochemicals, and dyes are ubiquitous chemical motifs. While the synthesis of common ring systems is well described and novel ring systems can be readily and computationally enumerated, the synthetic accessibility of unprecedented ring systems remains a challenge. “Ring Breaker” uses a data-driven approach to enable the prediction of ring-forming reactions, for which we have demonstrated its utility on frequently found and unprecedented ring systems, in agreement with literature syntheses. We demonstrate the performance of the neural network on a range of ring fragments from the ZINC and DrugBank databases and highlight its potential for incorporation into computer aided synthesis planning tools. These approaches to ring formation and retrosynthetic disconnection offer opportunities for chemists to explore and select more efficient syntheses/synthetic routes.

show abstract

Section: Discussionmentioning

confidence: 99%

“Ring Breaker”: Neural Network Driven Synthesis Prediction of the Ring System Chemical Space

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Atom-to-atom mapping (AAM) [6,7] is a procedure that establishes a correspondence between the atoms of reactants and products. AAM allows to identify a reaction centre (RC) which, in turn, helps to prepare reaction templates used in an automatized forward/retrosynthesis planning, [8][9][10][11][12] as well as to perform reaction classification [13] and reaction searching. [14,15] Several publicly and commercially available AAM tools are currently available.…”

Section: Introductionmentioning

confidence: 99%

Atom‐to‐atom Mapping: A Benchmarking Study of Popular Mapping Algorithms and Consensus Strategies

Lin

Dyubankova

Madzhidov

et al. 2021

Molecular Informatics

View full text Add to dashboard Cite

In this paper, we compare the most popular Atom-to-Atom Mapping (AAM) tools: ChemAxon, [1] Indigo, [2] RDTool, [3] NameRXN (NextMove), [4] and RXNMapper [5] which implement different AAM algorithms. An open-source RDTool program was optimized, and its modified version ("new RDTool") was considered together with several consensus mapping strategies. The Condensed Graph of Reaction approach was used to calculate chemical distances and develop the "AAM fixer" algorithm for an automatized correction of erroneous mapping. The benchmarking calculations were performed on a Golden dataset containing 1851 manually mapped and curated reactions. The best performing RXNMapper program together with the AMM Fixer was applied to map the USPTO database. The Golden dataset, mapped USPTO and optimized RDTool are available in the GitHub repository https://github.com/Laboratoire-de-Chemoinformatique.

show abstract

“…Second, the amount of data is much greater than the domain knowledge of individual researchers. Therefore, with the recent rapid progress of deep learning, the use of machine learning algorithms to learn the latent space of retrosynthetic reaction rules has become a very active research area [1][2][3][4][5][6][7][8][9][10][11]. A rule-based model is a machine learning algorithm that learns the reaction rules that correspond to the input target molecules.…”

Section: Introductionmentioning

confidence: 99%

“…The reaction center means changing parts (atom and bonding) before and after the chemical reaction. Hence, the strength of rule-based models [1][2][3][4] is that it is easy to identify the selected reaction rules for the target molecule compared with the molecular transformer based on sequence-tosequence models and the attention mechanism [5][6][7][8][9].…”

Section: Introductionmentioning

confidence: 99%

Efficient Data Undersampling for Rule-Based Retrosynthetic Planning

Park

Lee

Kwon

et al. 2021

Preprint

View full text Add to dashboard Cite

Computer-aided retrosynthetic planning for organic molecules, which is based on a large synthetic database, is a significant part of the recent development of an autonomous robotic chemist. As in other AI fields, however, the class imbalance problem in the dataset affects the prediction performance of retrosynthetic paths. Here, we demonstrate that applying undersampling methods to the imbalanced reaction dataset can improve the prediction of retrosynthetic rules for target molecules. We report improvements in the top-1 and top-10 prediction accuracies by 13.8% (13.1, 5.4%) and 8.8% (6.9, 2.4%) for the undersampling based on the similarity (random, dissimilarity) clustering of molecular structures of products, respectively. These results demonstrate the importance of a deep understanding of the statistical distribution, internal structure, and sampling for the training dataset. For practical application, the target-oriented undersampling method is proposed and confirmed by the improved prediction performance of 9.3 and 4.2% for top-1 and top-10 accuracies, respectively.

show abstract

Data Augmentation and Pretraining for Template-Based Retrosynthetic Prediction in Computer-Aided Synthesis Planning

Cited by 3 publications

References 11 publications

“Ring Breaker”: Neural Network Driven Synthesis Prediction of the Ring System Chemical Space

“Ring Breaker”: Neural Network Driven Synthesis Prediction of the Ring System Chemical Space

Atom‐to‐atom Mapping: A Benchmarking Study of Popular Mapping Algorithms and Consensus Strategies

Efficient Data Undersampling for Rule-Based Retrosynthetic Planning

Contact Info

Product

Resources

About