2020
DOI: 10.1021/acs.jcim.0c00403
|View full text |Cite
|
Sign up to set email alerts
|

Data Augmentation and Pretraining for Template-Based Retrosynthetic Prediction in Computer-Aided Synthesis Planning

Abstract: This work presents efforts to augment the performance of data-driven machine learning algorithms for reaction template recommendation used in computer-aided synthesis planning software. Often, machine learning models designed to perform the task of prioritizing reaction templates or molecular transformations are focused on reporting high accuracy metrics for the one-to-one mapping of product molecules in reaction databases to the template extracted from the recorded reaction. The available templates that get s… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
69
1
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 64 publications
(71 citation statements)
references
References 35 publications
0
69
1
1
Order By: Relevance
“…A recall as high as 0.9 still led to decreased routefinding capacity in comparison with the standard SoftMax model, probably because some of the 10% removed templates were crucial to find routes to some compounds. These observations are in contrast to the findings of a recent paper [10], where the artificial labels are used to pretrain a network that is then subsequently fine-tuned directly on the database labels to give the final policy model. The benefits of transfer learning suggest that at least some level of synergy is to be expected, which was not observed here, and led us to the final training regime of separate training and combination into a compound model post-training with explicit TensorFlow operations.…”
Section: Route-finding Capabilitycontrasting
confidence: 87%
“…A recall as high as 0.9 still led to decreased routefinding capacity in comparison with the standard SoftMax model, probably because some of the 10% removed templates were crucial to find routes to some compounds. These observations are in contrast to the findings of a recent paper [10], where the artificial labels are used to pretrain a network that is then subsequently fine-tuned directly on the database labels to give the final policy model. The benefits of transfer learning suggest that at least some level of synergy is to be expected, which was not observed here, and led us to the final training regime of separate training and combination into a compound model post-training with explicit TensorFlow operations.…”
Section: Route-finding Capabilitycontrasting
confidence: 87%
“…Training the model to learn different representations of the same reaction by distorting the initial canonical data eliminated the effect of memorization and increased the generalization performance of models. These ideas are intensively used, e.g., for image recognition 39 , and have been already successfully used in the context of several chemical problems 27 – 30 , including reaction predictions 18 , 31 , but were limited to the input data. For the first time we showed that application of augmentation to the target data significantly boosts the quality of the reaction prediction.…”
Section: Discussionmentioning
confidence: 99%
“…Though the canonicalization procedure exists 26 , it has been shown that models benefit from using a batch of random SMILES (augmentation) during training and inference 27 – 30 . Recently, such augmentation was also applied to reaction modeling 11 , 18 , 31 , 32 . The augmented (also sometimes called “random”) SMILES are all valid structures with the exception that the starting atom and the direction of the graph enumerations are selected randomly.…”
Section: Introductionmentioning
confidence: 99%
“…This strategy is originally proposed to alleviate the low-data problem by presenting the same entity with different representations and recent work has shown the successful applications of data augmentation in various neural networks. [34][35][36][37][38] With data augmentation, a chemical reaction can be represented by multiple SMILES strings and the model can obtain more knowledge of a reaction using a batch of random SMILES strings. Despite that the augmented SMILES strings contain same chemical information; the model can absorb more implicit feature of data by constructing a reaction with different SMILES sequences.…”
Section: Introductionmentioning
confidence: 99%