2023
DOI: 10.1039/d2sc06798f
|View full text |Cite
|
Sign up to set email alerts
|

Reagent prediction with a molecular transformer improves reaction data quality

Abstract: A molecular transformer predicts reagents for organic reactions. It is also able to replace questionable reagents in reaction data, e.g. USPTO, to enable better product prediction models to be trained on these new data.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(17 citation statements)
references
References 46 publications
0
17
0
Order By: Relevance
“…Foundational Chemistry Model. As demonstrated in previous publications, 90,93,99 the general chemistry model was pretrained from scratch by using the well-known USPTO_MIT mixed augmented database that contains approximately one million unlabeled organic chemical reactions. Briefly, this step was included to increase the model's vocabulary (∼5000 unique tokens) by providing sufficient chemical information in the form of text notation.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…Foundational Chemistry Model. As demonstrated in previous publications, 90,93,99 the general chemistry model was pretrained from scratch by using the well-known USPTO_MIT mixed augmented database that contains approximately one million unlabeled organic chemical reactions. Briefly, this step was included to increase the model's vocabulary (∼5000 unique tokens) by providing sufficient chemical information in the form of text notation.…”
Section: Resultsmentioning
confidence: 99%
“…The output model containing all of the trained parameters was stored in a directory called the foundational general chemistry model. The molecular Transformer USPTO_MIT Mixed Augmented database , was used to train as well as to evaluate the proposed chemistry model.…”
Section: Materials and Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Subsequently to our preprinted report, a separate study has investigated the performance of the reagent prediction transformer. 36 Most importantly, when tested on unseen molecules, the TTL provides validated disconnections at several possible reactive sites. By contrast, the baseline transformer, trained as reported by Schwaller et al 24 to produce SM directly from P using the unannotated data for training, chooses fewer disconnection points, as exemplified here for the pro-nucleotide 1 (Figure 2d).…”
Section: Triple Transformer Loop (Ttl) For Single-step Retrosynthesismentioning
confidence: 99%
“…Gimadiev et al presented a 4-step protocol for cleaning of molecular structures using data originating from Reaxys, USPTO, and Pistachio (e.g., functional group standardization, valence checking) as well as curation of the reaction transformation (e.g., via reaction balancing or atom mapping), but no further application such as predictive modeling was conducted. Andronov et al published a cleaning pipeline involving atom-mapping, removal of isotope information, and SMILES canonicalization for subsequent training of a transformer model for single-step retrosynthesis . ORDerly took inspiration from these previously published works to develop an open-source cleaning pipeline integrated with ORD, providing numerous reaction task benchmarks that have undergone in silico validation.…”
Section: Introductionmentioning
confidence: 99%