2019
DOI: 10.26434/chemrxiv.9992489.v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Predicting Retrosynthetic Pathways Using a Combined Linguistic Model and Hyper-Graph Exploration Strategy

Abstract: <div><div><div><p>We present an extension of our Molecular Transformer architecture combined with a hyper-graph exploration strategy for automatic retrosyn- thesis route planning without human intervention. The single-step ret- rosynthetic model sets a new state of the art for predicting reactants as well as reagents, solvents and catalysts for each retrosynthetic step. We introduce new metrics (coverage, class diversity, round-trip accuracy and Jensen-Shannon divergence) to evaluate th… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 64 publications
0
6
0
Order By: Relevance
“…In other words, given a product molecule from the reaction database, the model correctly predicted the template that was extracted from the reaction that was recorded to produce the product molecule. This metric evaluates the ability of the model to "learn the dataset" on which it was trained, but, as has been discussed recently, 28 is not the only useful metric to evaluate retrosynthetic models. Figure 5 shows the top-k accuracy of baseline and pretrained models for the USPTO-Sm and USPTO-Lg datasets.…”
Section: Top-k Accuracymentioning
confidence: 99%
“…In other words, given a product molecule from the reaction database, the model correctly predicted the template that was extracted from the reaction that was recorded to produce the product molecule. This metric evaluates the ability of the model to "learn the dataset" on which it was trained, but, as has been discussed recently, 28 is not the only useful metric to evaluate retrosynthetic models. Figure 5 shows the top-k accuracy of baseline and pretrained models for the USPTO-Sm and USPTO-Lg datasets.…”
Section: Top-k Accuracymentioning
confidence: 99%
“…Still, it indiscriminately regards other unobserved potentially feasible reactions as equally infeasible, resulting in a low recall. Essentially, retrosynthesis is a many-tomany problem (Thakkar et al 2022;Schwaller et al 2019b), where the target molecule M can potentially be synthesized through various distinct retro-strategies T , and vice versa. To mitigate the low recall issue, we aim to enhance the concept of template applicability by transforming the binary criteria of the observed ground truth P gt (M, T ) into a continuous approximation using a probabilistic model.…”
Section: Concept Enhancement For Label Shiftmentioning
confidence: 99%
“…This metric evaluates the ability of the model to "learn the dataset" on which it was trained, but, as has been discussed recently, 28 is not the only useful metric to evaluate retrosynthetic models. Figure 6 shows the breakdown of top-100 accuracy for each data set by the original, unaugmented template popularity.…”
Section: Top-k Accuracymentioning
confidence: 99%